Evaluating Prompt-Based and Fine-Tuned Approaches to Czech Anaphora Resolution

Stano,  Patrik; Horák,  Aleš

Evaluating Prompt-Based and Fine-Tuned Approaches to Czech Anaphora Resolution

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	STANO Patrik HORÁK Aleš
Year of publication	2025
Type	Article in Proceedings
Conference	Text, Speech, and Dialogue, TSD 2025
MU Faculty or unit	Faculty of Informatics
Citation
Keywords	anaphora resolution, sequence-to-sequence models, fine-tuning, prompt engineering
Description	Anaphora resolution plays a critical role in natural language understanding, especially in morphologically rich languages like Czech. This paper presents a comparative evaluation of two modern approaches to anaphora resolution on Czech text: prompt engineering with large language models (LLMs) and fine-tuning compact generative models. Using a dataset derived from the Prague Dependency Treebank, we evaluate several instruction-tuned LLMs, including Mistral Large 2 and Llama 3, using a series of prompt templates. We compare them against fine-tuned variants of the mT5 and Mistral models that we trained specifically for Czech anaphora resolution. Our experiments demonstrate that while prompting yields promising few-shot results (up to 74.5\% accuracy), the fine-tuned models, particularly mT5-large, outperform them significantly, achieving up to 88\% accuracy while requiring fewer computational resources. We analyze performance across different anaphora types, antecedent distances, and source corpora, highlighting key strengths and trade-offs of each approach.
Related projects:	LINDAT/CLARIAH-CZ - Digitální výzkumná infrastruktura pro jazykové technologie, umění a humanitní vědy O.S.C.A.R.S. - Open Science Clusters’ Action for Research and Society