Linguistic Tools in Musical Stylometry

Abrosimov, Kirill; Grebennikov, Alexander; Tzanetakis, George; Sidorova, Anna

doi:10.63744/Av1c2rVmcj0N

Abstract

In this paper, we investigate the applicability of linguistic stylometry methods to authorship attribution in music. We compare the use of delta methods involving the the analysis of token frequencies with static embeddings generated by distributional semantic models (Word2Vec and Doc2Vec) for the stylometry analysis of music using a symbolic representation. For this purpose, a classical music dataset derived from the MusicNet dataset is used. Applying the cosine delta approach, an F1 score of 0.63 was obtained in a classification approach to authorship attribution. Static embeddings achieved an Adjusted Rand Index of 0.42 using a clustering approach. In both cases, pre-processed extracted chords were used as tokens. We hypothesize that the frequency of using certain chords provides sufficient information to achieve reliable results in symbolic music stylometry analysis. Several methods of chord preprocessing were investigated: augmentation, lemmatization, and n-gram forming, with lemmatization being shown to be the most effective. The proposed methodology influenced by linguistics combined with the chord pre-processing methods can also be used for other tasks in symbolic music information retrieval.