Anthology of Computers and the Humanities · Volume 3

Towards Comparable Historical NER: Building a Shared Evaluation Corpus for 18th-Century Historical Texts

Lu Liu1 ORCID , Andreas Vlachidis1 , Adam Crymble1 , Deborah Lee1 and Marco Humbel1,2

  • 1 Department of Information Studies, University College London (UCL), London, United Kingdom
  • 2 The National Archives, London, United Kingdom

Permanent Link: https://doi.org/10.63744/dwCJ80qwvAtr

Published: 21 November 2025

Keywords: named entity recognition, evaluation corpus, historical documents, digital humanities, natural language processing

Abstract

Named Entity Recognition (NER) is increasingly applied to historical text analysis. However, differences in evaluation materials, metrics, and annotation guidelines across existing NER projects make it difficult to systematically compare different approaches to historical NER. This study addresses this issue by constructing an evaluation corpus through the normalization of four annotated datasets from the long 18th century. We evaluate the performance of the Edinburgh Geoparser, spaCy and BERT-based tool on this corpus using five evaluation modes. Results show that even under the most lenient criteria, the highest F1-score remains below 70%, highlighting the challenges of applying existing NER systems to historical texts. Through detailed error analysis, we identify common challenges such as spelling and formatting issues. These findings demonstrate the limitations of NER tools in historical documents. We argue that future work should involve collaboration with historians to ensure that evaluation corpus align with real user needs.