Automatic Named Entity Linking for Ancient Greek with a
Domain-Specific Knowledge Base

Beersmans, Marijke; Graaf, Evelien de; Keersmaekers, Alek; Depauw, Mark; Cruys, Tim Van de; Fantoli, Margherita

doi:10.63744/kYt0eVdjxjnt

Abstract

Named Entity Linking, or disambiguating named entities by linking them to a knowledge base, is an important Natural Language Processing task, especially in the humanities. In this paper, we examine the performance of the state-of-the-art entity linking model BLINK in connecting Ancient Greek person mentions to a domain-specific German-language knowledge base. To train the model, we create both gold-standard data through manual annotation and noisier silver data through automatic extraction. We then evaluate whether incorporating the latter improves performance. Our findings suggest that, overall, the results remain suboptimal for Ancient Greek. Increasing training data, even through automatic methods, shows promise. However, as it stands, using BLINK directly would be ill-suited for Named Entity Linking in the target setting. We discuss possible causes and suggest areas for improvement.¹

This paper has been revised. A coding error occurred during the conversion from the original BERT configuration to a RoBERTa-based one. This was due to manual padding. Because RoBERTa and BERT use different padding token IDs, the manual padding step inadvertently caused the CLS token to be masked during training. This primarily affected the bi-encoder, where BLINK relies on CLS pooling, and to a lesser extent the cross-encoder. Since then, the code has been corrected and reruns have been made. This has impacted the results, some specific claims in the error analysis and the training specifications for one model configuration, although overall conclusions held. The authors apologize for this oversight. A document with a more detailed overview of our changes can be found on GitHub ↩︎

Automatic Named Entity Linking for Ancient Greek with a Domain-Specific Knowledge Base