Frequency of Low-Frequency Words in Text Corpora
| Autoři | |
|---|---|
| Rok publikování | 2010 |
| Druh | Článek ve sborníku |
| Konference | Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2010 |
| Fakulta / Pracoviště MU | |
| Citace | |
| www | https://nlp.fi.muni.cz/raslan/2010/paper15.pdf |
| Obor | Jazykověda |
| Klíčová slova | Computational linguistics Language model; Low-frequency; Text analysis; Text corpora |
| Popis | Low-frequency words, esp. words occurring only once in a text corpus, are very popular in text analysis. Also many lexicographers draw attention to such words. This paper lists a detailed statistical analysis of low-frequency words. The results provides important information for many practical applications, including lexicography and language modeling. |
| Související projekty: |