Automatic detection and classification of literary character
properties in German narratives

Pagel, Janis; Reiter, Nils

doi:10.63744/UkAh6gmT12av

Abstract

This work presents an approach to automatically (i) identify sentences in German narrative texts that contain properties of literary characters and (ii) assign categories to these sentences according to what kind of property is described, both with coarse – such as role, clothing or physiognomy – and fine-grained categories – such as occupation, accessories or face. To this end, we test different transformer-based models (BERT, ELECTRA, RoBERTa, Llama) and compare the results to simple baselines (majority, random, bag-of-words Naive Bayes). We find that an uncased ELECTRA model achieves promising results in identifying sentences that contain character properties (67% F1), while uncased BERT achieves highest results in assigning coarse-grained categories to sentences (87% F1) and RoBERTa is the best model in assigning fine-grained categories (80% F1). A LoRA-tuned Llama 3.1 large language model is able to achieve comparable scores to the best encoder model on the coarse-grained task (81% F1), but is still 6 percentage points below the fine-tuned German BERT model.

Automatic detection and classification of literary character properties in German narratives