Frequency of Low-Frequency Words in Text Corpora
| Authors | |
|---|---|
| Year of publication | 2010 |
| Type | Article in Proceedings |
| Conference | Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2010 |
| MU Faculty or unit | |
| Citation | |
| web | https://nlp.fi.muni.cz/raslan/2010/paper15.pdf |
| Field | Linguistics |
| Keywords | Computational linguistics Language model; Low-frequency; Text analysis; Text corpora |
| Description | Low-frequency words, esp. words occurring only once in a text corpus, are very popular in text analysis. Also many lexicographers draw attention to such words. This paper lists a detailed statistical analysis of low-frequency words. The results provides important information for many practical applications, including lexicography and language modeling. |
| Related projects: |