Topic Modelling of the Czech Supreme Court Decisions

Varování

Publikace nespadá pod Ústav výpočetní techniky, ale pod Právnickou fakultu. Oficiální stránka publikace je na webu muni.cz.

Autoři

NOVOTNÁ Tereza HARAŠTA Jakub KÓL Jakub

Rok publikování 2020
Druh Další prezentace na konferencích
Fakulta / Pracoviště MU

Právnická fakulta

Citace
Přiložené soubory
Popis Czech Supreme Court produces several thousands of court decisions per year. The Supreme court decisions are published almost unprocessed in the full-text with minimal fundamental metadata (date of the decision, docket number). This fact makes a case law research very time-consuming. Therefore, new automatic methods of processing court decisions need to be developed in order to improve ways how to retrieve more relevant case law efficiently. Topic modelling methods have the potential to cluster a large number of documents automatically or to provide new categories of relevant metadata to these documents. In this paper, two topic modelling methods - latent Dirichlet allocation and non-negative matrix factorization are applied to the corpus of Czech Supreme Court decisions. Several models for methods are trained and compared according to their coherence scores in order to find the best number of topics. Further manual qualitative analysis of the most coherent models is performed by authors.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.

Další info