Anthology of Computers and the Humanities · Volume 3

Automatic detection and classification of literary character properties in German narratives

Janis Pagel1 ORCID and Nils Reiter1 ORCID

  • 1 Department for Digital Humanities, University of Cologne, Cologne, Germany

Permanent Link: https://doi.org/10.63744/UkAh6gmT12av

Published: 21 November 2025

Keywords: computational literary studies, literary character properties, transformers, large language models

Abstract

This work presents an approach to automatically (i) identify sentences in German narrative texts that contain properties of literary characters and (ii) assign categories to these sentences according to what kind of property is described, both with coarse – such as role, clothing or physiognomy – and fine-grained categories – such as occupation, accessories or face. To this end, we test different transformer-based models (BERT, ELECTRA, RoBERTa, Llama) and compare the results to simple baselines (majority, random, bag-of-words Naive Bayes). We find that an uncased ELECTRA model achieves promising results in identifying sentences that contain character properties (67% F1), while uncased BERT achieves highest results in assigning coarse-grained categories to sentences (87% F1) and RoBERTa is the best model in assigning fine-grained categories (80% F1). A LoRA-tuned Llama 3.1 large language model is able to achieve comparable scores to the best encoder model on the coarse-grained task (81% F1), but is still 6 percentage points below the fine-tuned German BERT model.