Anthology of Computers and the Humanities · Volume 3

The Rest is Silence: Leveraging Unseen Species Models for Computational Musicology

Fabian C. Moss1 ORCID , Jan Hajič jr.2 ORCID , Adrian Nachtwey3 ORCID , Laurent Pugin4 ORCID and Author Three2

  • 1 Institut für Musikforschung, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
  • 2 Institute of Formal and Applied Linguistics, Charles University, Prague, Czech Republic
  • 3 KreativInstitut.OWL, Paderborn University, Paderborn, Germany
  • 4 RISM Digital Center, Bern, Switzerland

Permanent Link: https://doi.org/10.63744/tP4bLwLkye8B

Published: 21 November 2025

Keywords: Unseen Species Models, Computational Musicology, RISM, Gregorian Chant, Corpus Studies, Chord Vocabularies, Archives, Databases

Abstract

For many decades, musicologists have engaged in creating large databases serving different purposes for musicological research and scholarship. With the rise of fields like music information retrieval and digital musicology, there is now a constant and growing influx of musicologically relevant datasets and corpora. In historical or observational settings, however, these datasets are necessarily incomplete, and the true extent of a collection of interest remains unknown — silent. Here, we apply so-called Unseen Species models (USMs) from ecology to areas of musicological activity. After introducing the models formally, we show in four case studies how USMs can be applied to musicological data to address quantitative questions like: How many composers are we missing in RISM? What percentage of medieval sources of Gregorian chant have we already cataloged? How many differences in music prints do we expect to find between editions? How large is the coverage of songs from genres of a folk music tradition? And, finally, how close are we in estimating the size of the harmonic vocabulary of a large number of composers?