Anthology of Computers and the Humanities · Volume 2

There and back again—how to preserve your data during transformations

Robert Casties1

  • 1 Digital Humanities Team, Max Planck Institute for History of Science, Berlin, Germany

Permanent Link: https://doi.org/10.63744/EZFV8PwG9Frz

Published: 22 October 2025

Keywords: software, best practice, data integrity, digital humanities

Abstract

Datasets often need to be transformed—from an external file format into a database, from one database system into another, or from an outdated legacy system into an archive format. All transformations carry the risk of data corruption through software errors. What can software developers do to check data integrity and make sure that data is not lost or changed in the process? This paper presents three basic approaches with different degrees of accuracy and complexity, illustrated by real-world examples. The first example shows simple end-to-end statistics by counting entities during the import of a dataset into a database, the next more complex example shows bookkeeping of entities during the conversion of an HTML website, and the last a full round-trip migration and comparison of a project database and data model.