Anthology of Computers and the Humanities · Volume 3

Linguistic Tools in Musical Stylometry

Kirill Abrosimov1 ORCID , Alexander Grebennikov2,3 ORCID , George Tzanetakis1 ORCID and Anna Sidorova4 ORCID

  • 1 Faculty of Engineering and Computer Science, University of Victoria, Victoria, Canada
  • 2 Faculty of Philology, St. Petersburg State University, Saint Petersburg, Russia
  • 3 Foreign Language Training Center, ITMO University, Saint Petersburg, Russia
  • 4 AI Talent Hub, ITMO University, Saint Petersburg, Russia

Permanent Link: https://doi.org/10.63744/Av1c2rVmcj0N

Published: 21 November 2025

Keywords: musical stylometry, symbolic music processing, representations of music, delta methods, static embeddings

Abstract

In this paper, we investigate the applicability of linguistic stylometry methods to authorship attribution in music. We compare the use of delta methods involving the the analysis of token frequencies with static embeddings generated by distributional semantic models (Word2Vec and Doc2Vec) for the stylometry analysis of music using a symbolic representation. For this purpose, a classical music dataset derived from the MusicNet dataset is used. Applying the cosine delta approach, an F1 score of 0.63 was obtained in a classification approach to authorship attribution. Static embeddings achieved an Adjusted Rand Index of 0.42 using a clustering approach. In both cases, pre-processed extracted chords were used as tokens. We hypothesize that the frequency of using certain chords provides sufficient information to achieve reliable results in symbolic music stylometry analysis. Several methods of chord preprocessing were investigated: augmentation, lemmatization, and n-gram forming, with lemmatization being shown to be the most effective. The proposed methodology influenced by linguistics combined with the chord pre-processing methods can also be used for other tasks in symbolic music information retrieval.