Towards Perfection of Machine Learning of Competing Patterns: The Use Case of Czechoslovak Patterns Development

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

SOJKA Ondřej SOJKA Petr

Year of publication 2023
Type Article in Proceedings
Conference Recent Advances in Slavonic Natural Language Processing (RASLAN 2023)
MU Faculty or unit

Faculty of Informatics

Citation
Web fulltext PDF
Keywords dictionary problem; effectiveness; hyphenation patterns; patgen; syllabification; Czech; Slovak; Czechoslovak patterns; machine learning
Description Finding space- and time-effective even perfect solution to the dictionary problem is an important practical and research problem, which solving may lead to a breakthrough in computation. Competing pattern technology from TeX is a special case, where for a given dictionary a word segmentation is stored in the competing patterns yet with very good generalization quality. Recently, the unreasonable effectiveness of pattern generation has been shown---it is possible to use hyphenation patterns to solve the dictionary problem jointly even for several languages without compromise.

In this article, we study the effectiveness of patgen for the supervised machine learning of the generation of the Czechoslovak hyphenation patterns. We show the machine learning techniques to develop competing patterns that are close to being perfect. We evaluate the new approach by improvements and space savings we gained during the development and finetuning of Czechoslovak hyphenation patterns.

Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info