Données et modèles pour le traitement des documents en néolatin: le
cas Lambert Daneau

Goy, Floriane; Schürmann, Noemi; Manig, Benjamin; Colombo, Matteo; Zahnd, Ueli; Krauter, Stefan

doi:10.63744/5TcizCXUUTmJ

Anthology of Computers and the Humanities · Volume 4

Données et modèles pour le traitement des documents en néolatin: le cas Lambert Daneau

Floriane Goy¹ , Noemi Schürmann² , Benjamin Manig² , Matteo Colombo¹ , Ueli Zahnd¹ et Stefan Krauter²

¹ Institut d’histoire de la Réformation, Université de Genève, Suisse
² Université de Zürich, Suisse

Download PDF Download Citation

Permanent Link: https://doi.org/10.63744/5TcizCXUUTmJ

Published: 21 May 2025

Keywords: latin philology, neo-latin, layout analysis, automatic text recognition, linguistic normalisation, lemmatisation

Mots clés : philologie latine, néolatin, analyse de mise en page, reconnaissance automatique de texte, normalisation linguistique, lemmatisation

Abstract

This article presents the construction of a corpus of sixteenth-century commentaries on the Epistles of Paul, based on the digitization of numerous printed works in Neo-Latin. As this subtype of Latin is still underrepresented in existing datasets, it required the development of specific resources for training suitable models. The prepared data and models for ATR post-correction and lemmatization are described here to enable systematic digital exploitation of the historical material.