Annoter et détecter les citations : vers un cadre unifié entre
linguistique et humanités computationnelles

Saulnier, Agnès

doi:10.63744/UM96NSIlUNaK

Abstract

Quotation is a central but theoretically unstable object, situated at the intersection of linguistic, narrative, media, and computational traditions. This plurality is reflected in existing annotated corpora, which rely on heterogeneous scopes and categories, with consequences for automatic detection and evaluation. Based on a review of the literature and an analysis of existing corpora, this article shows why these divergences become problematic in computational contexts. It then proposes a typology of quotation grounded in linguistic descriptions, designed to make explicit the scope choices underlying annotated corpora. The discussion shows how this typology can guide annotation practices, support system comparison, and facilitate extensions to audiovisual data.

Annoter et détecter les citations : vers un cadre unifié entre linguistique et humanités computationnelles