Llamas Don’t Understand Fiction: Application and Evaluation of Large
Language Models for Knowledge Extraction from Short Stories in
English

Graciotti, Arianna; Pannach, Franziska; Presutti, Valentina; Pianzola, Federico

doi:10.63744/iCGYNUN0uUAe

Abstract

Extracting event knowledge from unstructured text is a well-known challenge in Natural Language Processing (NLP) and is particularly difficult when dealing with fiction. Subtext, rather than explicit information, and figurative style in fictional narratives, complicate event extraction. Recent advances in Large Language Models (LLMs) have improved performance across various NLP tasks. However, their effectiveness in extracting events from fiction remains underexplored. In this article, we evaluate the performance of open-weights LLMs to extract character death events from fictional narratives in English. These events are defined as triples consisting of Victim, Perpetrator, and Mode of Demise. We cast Knowledge Extraction (KE) as a zero-shot task and evaluate our approach on a manually annotated benchmark of fanfiction stories. Our results show that LLMs struggle with KE from fiction, with a maximum F1-score of 0.45 across the elements constituting the triples and, at most, 25% of death events correctly extracted. A detailed error analysis reveals that most errors stem from missed death events and from direct presentation modes, such as direct speech, which significantly impair extraction performance. Moreover, KE accuracy declines as the story length increases, while LLMs’ background knowledge leakage contributes to false positives. These findings provide domain-specific insights into the challenges of KE in fiction.

Llamas Don’t Understand Fiction: Application and Evaluation of Large Language Models for Knowledge Extraction from Short Stories in English