Přegenerování a podgenerování : Jak efektivně vyhledávat v jazykových korpusech data pro lingvistický výzkum
| Title in English | Over/under Generating : How to Search Data for Linguistic Analysis in Language Corpora |
|---|---|
| Authors | |
| Year of publication | 2024 |
| Type | Requested lectures |
| MU Faculty or unit | |
| Citation | |
| Description | In this talk, we will show, how to minimize the overgeneration (to increase accuracy) and to prevent undergeneration (to maintain coverage) in corpus-based word formation research. On a specific example of retrieval of candidates for a word formation model (kutil) we shall show how to use observation of corpus data for progressive specification of corpus query. The data obtained from the corpus will be analysed from a quantitative and qualitative point of view. Next, we show to what extent homonymy of nouns formed by conversion of l-participles has a negative effect on the results of POS disambiguation. |
| Related projects: |