Anthology of Computers and the Humanities · Volume 3

Says Who? Effective Zero-Shot Annotation of Focalization

Rebecca M. M. Hicke1,2 ORCID , Yuri Bizzoni2 ORCID , Pascale Feldkamp2 ORCID and Ross Deans Kristensen-McLachlan2,3,4 ORCID

  • 1 Department of Computer Science, Cornell University, Ithaca, NY, USA
  • 2 Center for Humanities Computing, Aarhus University, Aarhus, Denmark
  • 3 Department of Linguistics, Cognitive Science, and Semiotics, Aarhus University, Aarhus, Denmark
  • 4 TEXT - Center for the Contemporary Cultures of Text, Aarhus University, Aarhus, Denmark

Permanent Link: https://doi.org/10.63744/xxqzxENxsh3b

Published: 21 November 2025

Keywords: focalization, computational literary studies, large language models, immersivity

Abstract

Focalization describes the way in which access to narrative information is restricted or controlled based on the knowledge available to knowledge of the narrator. It is encoded via a wide range of lexico-grammatical features and is subject to reader interpretation. Even trained annotators frequently disagree on correct labels, suggesting this task is both qualitatively and computationally challenging. In this work, we test how well five contemporary large language model (LLM) families and two baselines perform when annotating short literary excerpts for focalization. Despite the challenging nature of the task, we find that LLMs show comparable performance to trained human annotators, with GPT-4o achieving an average F1 of 84.79%. Further, we demonstrate that the log probabilities output by GPT-family models frequently reflect the difficulty of annotating particular excerpts. Finally, we provide a case study analyzing sixteen Stephen King novels, demonstrating the usefulness of this approach for computational literary studies and the insights gleaned from examining focalization at scale.