Anthology of Computers and the Humanities · Volume 3

Identifying Stance-Bearing Keywords in Public Debates with Instruction-Tuned Language Models

Milena Belosevic

  • 1 Faculty of Linguistics and Literary Studies, Bielefeld University, Bielefeld, Germany

Permanent Link: https://doi.org/10.63744/NskPfU7etX83

Published: 21 November 2025

Keywords: language models, stance detection, instruction-tuning, human annotation, politically salient language, discourse analysis, argumentation, keyword extraction

Abstract

The paper presents a computational approach to discourse analysis that identifies stance-bearing keywords: evaluative lexical units embedded in arguments that signal support for or opposition to a debated issue. Based on a dataset of 4035 arguments from the migration debate in Germany between 2015 and 2017, we explore how language models can be instruction-tuned to extract such lexical units. Human-annotated keywords serve as the basis for training and evaluation, providing a gold standard for model comparison. We apply supervised fine-tuning on BübleLM, a German-only language model, and compare it with both a three-shot prompted version and two established baselines: a named entity recognition (NER)-tuned German BERT model and a similarity-based keyword extractor (KeyBERT with RoBERTa embeddings). The findings show that the instruction-tuned BübleLM aligns more closely with human annotations than the baselines and its three-shot prompted variant. This suggests that domain-specific tuning can capture evaluative cues central to discourse-specific argumentation. The study contributes to ongoing efforts in computational humanities to combine machine learning with discourse-sensitive human annotation of politically salient language.