Anthology of Computers and the Humanities · Volume 3

The Latin Language Evolved Over Time, Masked Models Disregard That

Miriam Cuscito1 ORCID , Alfio Ferrara2 ORCID and Martin Ruskov3 ORCID

  • 1 Department of Languages, Literatures, Cultures and Mediations, University of Milan, Milan, Italy
  • 2 Department of Informatics, University of Milan, Milan, Italy
  • 3 Department of Languages and Modern Cultures, University of Genoa, Genoa, Italy

Permanent Link: https://doi.org/10.63744/sLAHYnQdA8fu

Published: 21 November 2025

Keywords: historical adequacy, algorithmic bias, masked language models, Latin language, model evaluation

Abstract

Training of Latin language models is rarely done with consideration of important historical watersheds. Here we demonstrate how this leads to a poor performance when specific socio-temporal contextualisation is sought, something common to humanities research. We perform an evaluation that compares the historical adequacy of Latin language models, i.e. their ability to generate tokens, representative for a historical period. We adopt a previously established method and refine it to overcome limitations due to Latin being an under-resourced language and one with intense tradition of intertextuality. To do this we extract word lists and concordances from the LatinISE corpus and use them to compare seven masked language models trained for Latin. We further perform statistical analysis of the results in order to identify the best and worst performing models in each of the historical contexts of interest. We show that BERT medieval multilingual best captures the Classical linguistic context. Four models are indistinguishably good in our evaluation of the the Neo-Latin linguistic context. These findings have broad implications for wider historical language research and beyond. Among these, we emphasise the need to train historical language models with due attention on consistent historical periods and we discuss the possible usefulness of noisy predictions. Historical research of language models provides a neat demonstration of how model biases could impact their performance in specific domains.