Anthology of Computers and the Humanities · Volume 4

Données et modèles pour le traitement des documents en néolatin: le cas Lambert Daneau

Floriane Goy1 , Noemi Schürmann2 , Benjamin Manig2 , Matteo Colombo1 , Ueli Zahnd1 and Stefan Krauter2

  • 1 Institut d’histoire de la Réformation, Université de Genève, Suisse
  • 2 Université de Zürich, Suisse

Permanent Link: https://doi.org/10.63744/5TcizCXUUTmJ

Published: 21 May 2025

Keywords: latin philology, neo-latin, layout analysis, automatic text recognition, linguistic normalisation, lemmatisation

Mots clés : philologie latine, néolatin, analyse de mise en page, reconnaissance automatique de texte, normalisation linguistique, lemmatisation

Abstract

This article presents the construction of a corpus of sixteenth-century commentaries on the Epistles of Paul, based on the digitization of numerous printed works in Neo-Latin. As this subtype of Latin is still underrepresented in existing datasets, it required the development of specific resources for training suitable models. The prepared data and models for ATR post-correction and lemmatization are described here to enable systematic digital exploitation of the historical material.