Anthology of Computers and the Humanities · Volume 4

Annoter et détecter les citations : vers un cadre unifié entre linguistique et humanités computationnelles

Agnès Saulnier1

  • 1 Institut national de l’audiovisuel, Bry-sur-Marne, France

Permanent Link: https://doi.org/10.63744/UM96NSIlUNaK

Published: 21 May 2025

Keywords: reported speech, quotation, annotated corpora, audiovisual corpora, NLP

Mots clés : discours rapporté, citation, corpus annotés, corpus audiovisuel, TAL

Abstract

Quotation is a central but theoretically unstable object, situated at the intersection of linguistic, narrative, media, and computational traditions. This plurality is reflected in existing annotated corpora, which rely on heterogeneous scopes and categories, with consequences for automatic detection and evaluation. Based on a review of the literature and an analysis of existing corpora, this article shows why these divergences become problematic in computational contexts. It then proposes a typology of quotation grounded in linguistic descriptions, designed to make explicit the scope choices underlying annotated corpora. The discussion shows how this typology can guide annotation practices, support system comparison, and facilitate extensions to audiovisual data.