Evaluation of the Cross-lingual Embedding Models from the Lexicographic Perspective

Denisová,  Michaela; Rychlý,  Pavel

Evaluation of the Cross-lingual Embedding Models from the Lexicographic Perspective

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	DENISOVÁ Michaela RYCHLÝ Pavel
Year of publication	2023
Type	Article in Proceedings
Conference	Electronic lexicography in the 21st century (eLex 2023): Invisible Lexicography. Proceedings of the eLex 2023 conference
MU Faculty or unit	Faculty of Informatics
Citation
web	Plný text
Keywords	cross-lingual embedding models; bilingual lexicon induction task; retrieving translation equivalents; evaluation
Description	Cross-lingual embedding models (CMs) enable us to transfer lexical knowledge across languages. Therefore, they represent a useful approach for retrieving translation equivalents in lexicography. However, these models have been mainly oriented towards the natural language processing (NLP) field, lacking proper evaluation with error evaluation datasets that were compiled automatically. This causes discrepancies between models hindering the correct interpretation of the results. In this paper, we aim to address these issues and make these models more accessible for lexicography by evaluating them from a lexicographic point of view. We evaluate three benchmark CMs on three diverse language pairs: close, distant, and different script languages. Additionally, we propose key parameters that the evaluation dataset should include to meet lexicographic needs, have reproducible results, accurately reflect the performance, and set appropriate parameters during training. Our code and evaluation datasets are publicly available.
Related projects:	Interní grantová agentura Masarykovy univerzity Finding translation equivalents without parallel texts