DSL Shared task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model

Varování

Publikace nespadá pod Ústav výpočetní techniky, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu muni.cz.

Autoři	HERMAN Ondřej SUCHOMEL Vít BAISA Vít RYCHLÝ Pavel
Rok publikování	2016
Druh	Článek ve sborníku
Konference	Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)
Fakulta / Pracoviště MU	Fakulta informatiky
Citace
www	https://aclanthology.info/pdf/W/W16/W16-4815.pdf
Obor	Informatika
Klíčová slova	language discrimination;expectation maximization;language model
Popis	In this paper we investigate two approaches to discrimination of similar languages: Expectation--maximization algorithm for estimating conditional probability P(word\|language) and byte level language models similar to compression-based language modelling methods. The accuracy of these methods reached respectively 86.6 % and 88.3 % on set A of the DSL Shared task 2016 competition.
Související projekty:	Harvesting big text data for under-resourced languages Rozsáhlé výpočetní systémy: modely, aplikace a verifikace V.