Anthology of Computers and the Humanities · Volume 3

Unstable Data and the Unusual Case of the Prosody Excerpt in the Digital Library

Rebecca Sutton Koeser1 ORCID , Mary Naydan1 ORCID , Meredith Martin1,2 and Meredith Martin1 ORCID

  • 1 Center for Digital Humanities, Princeton University, Princeton, New Jersey, USA
  • 2 Department of English, Princeton University, Princeton, New Jersey, USA

Permanent Link: https://doi.org/10.63744/cTWwVSItf41f

Published: 21 November 2025

Keywords: humanities data, unstable data, reproducibility, digital libraries

Abstract

Stable data is essential for repeatable research, and cultural heritage data is an invaluable resource for computational humanities research, but the fluctuations within this kind of data pose challenges to reproducible, repeatable, and follow-up research. This paper uses the case study of HathiTrust Digital Library content within the Princeton Prosody Archive (PPA), particularly excerpted and augmented content, as a window into the surprising instability of this large-scale data. We analyze the rate of change in PPA excerpted content, all of PPA, and all of HathiTrust resulting from HathiTrust updates. We use this case study to illuminate the degree of change in HathiTrust, as one exemplar of a cultural heritage data aggregator, which we think is not well understood by computational researchers, and to consider the implications for research.