QaLLM: An LLM-based NER Dataset Curation, Annotation and Evaluation
in Historical Urdu Elegies

Irfan, Saniya; Ali, Syed Juned

doi:10.63744/nxwIBAfXngn3

Abstract

Digital humanities increasingly use computational tools to analyze large literary corpora, yet low-resource, right-to-left languages like Urdu remain underserved, thereby, hindering research on culturally rich genres such as Marsiya, South Asia’s elegiac poetry tradition. Named-entity recognition (NER) in Marsiya involves specific challenges that existing methods either ignore the genre or depend on costly manual annotation, limiting digital humanities research in the Global South. To address this, we present QaLLM, an end-to-end framework leveraging large language models (LLMs) for Urdu Marsiya NER with a human-in-the-loop validation stage. We conduct empirical analysis comparing multiple state-of-the-art LLMs and prompting configurations, and employ an LLM-as-a-Judge strategy using independent models to evaluate tagging quality. Results show that LLMs can serve as reliable first-pass annotators and reviewers, enabling efficient tagging and validation. Our contributions include — (i) the first publicly available Urdu Marsiya NER dataset, (ii) an open, reproducible methodology for low-resource, right-to-left NER with human and LLM-based validation, and (iii) an extensive comparative evaluation of LLMs and prompting strategies. The framework generalizes to other low-resource, complex-script languages, supporting reproducible digital scholarship and inclusive computational analysis of global literary heritage.

QaLLM: An LLM-based NER Dataset Curation, Annotation and Evaluation in Historical Urdu Elegies