Unstable Data and the Unusual Case of the Prosody Excerpt in the
Digital Library

Koeser, Rebecca Sutton; Naydan, Mary; Martin, Meredith

doi:10.63744/cTWwVSItf41f

Abstract

Stable data is essential for repeatable research, and cultural heritage data is an invaluable resource for computational humanities research, but the fluctuations within this kind of data pose challenges to reproducible, repeatable, and follow-up research. This paper uses the case study of HathiTrust Digital Library content within the Princeton Prosody Archive (PPA), particularly excerpted and augmented content, as a window into the surprising instability of this large-scale data. We analyze the rate of change in PPA excerpted content, all of PPA, and all of HathiTrust resulting from HathiTrust updates. We use this case study to illuminate the degree of change in HathiTrust, as one exemplar of a cultural heritage data aggregator, which we think is not well understood by computational researchers, and to consider the implications for research.

Unstable Data and the Unusual Case of the Prosody Excerpt in the Digital Library