csTenTen17, a Recent Czech Web Corpus
| Authors | |
|---|---|
| Year of publication | 2018 |
| Type | Article in Proceedings |
| Conference | Proceedings of the Twelfth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2018 |
| MU Faculty or unit | |
| Citation | |
| web | https://nlp.fi.muni.cz/raslan/2018/paper10-Suchomel.pdf |
| Keywords | Czech corpus; web corpus; text processing |
| Description | This article introduces a very large Czech text corpus for language research – csTenTen17 compiled from texts downloaded in 2015, 2016 and 2017. The corpus is consisting of 10.5 billion words reaching double the size of its predecessor from 2012. A brief comparison with other recent Czech corpora follows. |
| Related projects: |