Anthology of Computers and the Humanities · Volume 3

Llamas Don’t Understand Fiction: Application and Evaluation of Large Language Models for Knowledge Extraction from Short Stories in English

Arianna Graciotti1,2 ORCID , Franziska Pannach1 ORCID , Valentina Presutti1 ORCID and Federico Pianzola1 ORCID

  • 1 Centre for Language and Cognition, University of Groningen, Groningen, Netherlands
  • 2 Department of Languages, Literatures and Modern Cultures, University of Bologna, Bologna, Italy

Permanent Link: https://doi.org/10.63744/iCGYNUN0uUAe

Published: 21 November 2025

Keywords: Event Extraction, Fiction, Human-Centered Evaluation, LLMs, Zero/Few-Shot Extraction

Abstract

Extracting event knowledge from unstructured text is a well-known challenge in Natural Language Processing (NLP) and is particularly difficult when dealing with fiction. Subtext, rather than explicit information, and figurative style in fictional narratives, complicate event extraction. Recent advances in Large Language Models (LLMs) have improved performance across various NLP tasks. However, their effectiveness in extracting events from fiction remains underexplored. In this article, we evaluate the performance of open-weights LLMs to extract character death events from fictional narratives in English. These events are defined as triples consisting of Victim, Perpetrator, and Mode of Demise. We cast Knowledge Extraction (KE) as a zero-shot task and evaluate our approach on a manually annotated benchmark of fanfiction stories. Our results show that LLMs struggle with KE from fiction, with a maximum F1-score of 0.45 across the elements constituting the triples and, at most, 25% of death events correctly extracted. A detailed error analysis reveals that most errors stem from missed death events and from direct presentation modes, such as direct speech, which significantly impair extraction performance. Moreover, KE accuracy declines as the story length increases, while LLMs’ background knowledge leakage contributes to false positives. These findings provide domain-specific insights into the challenges of KE in fiction.