Anthology of Computers and the Humanities · Volume 3

More Sound, More Soundness? Improving Authorship Attribution with Phonemes

Simon Gabay1 ORCID , Florian Cafiero2 ORCID and Jean-Luc Falcone1 ORCID

  • 1 Université de Genève, Geneva, Switzerland
  • 2 École nationale des chartes | Paris Sciences Lettres, Paris, France

Permanent Link: https://doi.org/10.63744/JsYBzks5UCwg

Published: 21 November 2025

Keywords: Stylometry, Phonetics, Stylistics, Authorship attribution

Abstract

This paper assesses whether turning written French poetry into a speech-oriented representation can improve the performance of authorship attribution methods. To this end, we develop a phonetic transcription system to automatically convert poems from six authors – including the disputed Illuminations of Rimbaud – into phonetic transcriptions, and adapt existing tools to ingest and process phonetic data. The output of this grapheme-to-phoneme task is then enriched with minimal prosodic cues, namely the creation of synthetic tokens based on punctuation and the addition of basic French liaisons. Using the same trigram features and classifier across all representations, we observe that moving from orthographic to phonetic transcriptions with a modest prosodic enrichment raises the F-score from 0.89 to 0.95, while reducing inter-author confusion. These results suggest that even lightweight speech-based features, produced with reproducible rules and open tools, can meaningfully enhance stylometric analysis of French verse and warrant further study for contested texts.