Tailored Fine-Tuning For The Comma Insertion In Czech
| Autoři | |
|---|---|
| Rok publikování | 2025 |
| Druh | Článek v odborném periodiku |
| Časopis / Zdroj | Jazykovedný časopis |
| Fakulta / Pracoviště MU | |
| Citace | |
| www | https://www.juls.savba.sk/ediela/jc/2025/1/jc25-01.pdf |
| Doi | https://doi.org/10.2478/jazcas-2025-0024 |
| Klíčová slova | comma; Czech language; Fine-tuning; Large Language Model (LLM) |
| Popis | Transfer learning techniques, particularly the use of pre-trained Transformers, can be trained on vast amounts of text in a particular language and can be tailored to specific grammar correction tasks, such as automatic punctuation correction. The Czech pre-trained RoBERTa model demonstrates outstanding performance in this task (Machura et al. 2022); however, previous attempts to improve the model have so far led to a slight degradation (Machura et al. 2023). In this paper, we present a more targeted fine-tuning of this model, addressing linguistic phenomena that the base model overlooked. Additionally, we provide a comparison with other models trained on a more diverse dataset beyond just web texts. |
| Související projekty: |