Evaluating Prompt-Based and Fine-Tuned Approaches to Czech Anaphora Resolution

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

STANO Patrik HORÁK Aleš

Year of publication 2025
Type Article in Proceedings
Conference Text, Speech, and Dialogue, TSD 2025
MU Faculty or unit

Faculty of Informatics

Citation
Keywords anaphora resolution, sequence-to-sequence models, fine-tuning, prompt engineering
Description Anaphora resolution plays a critical role in natural language understanding, especially in morphologically rich languages like Czech. This paper presents a comparative evaluation of two modern approaches to anaphora resolution on Czech text: prompt engineering with large language models (LLMs) and fine-tuning compact generative models. Using a dataset derived from the Prague Dependency Treebank, we evaluate several instruction-tuned LLMs, including Mistral Large 2 and Llama 3, using a series of prompt templates. We compare them against fine-tuned variants of the mT5 and Mistral models that we trained specifically for Czech anaphora resolution. Our experiments demonstrate that while prompting yields promising few-shot results (up to 74.5\% accuracy), the fine-tuned models, particularly mT5-large, outperform them significantly, achieving up to 88\% accuracy while requiring fewer computational resources. We analyze performance across different anaphora types, antecedent distances, and source corpora, highlighting key strengths and trade-offs of each approach.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info