On Dimensionality of Latent Semantic Indexing for Text Segmentation
| Název česky | K dimenzionalitě Lantentního Sémantického Indexování pro segmentaci textu |
|---|---|
| Autoři | |
| Rok publikování | 2007 |
| Druh | Článek v odborném periodiku |
| Časopis / Zdroj | Proceedings of the International Multiconference on Computer Science and Information Technology |
| Fakulta / Pracoviště MU | |
| Citace | |
| www | http://www.papers2007.imcsit.org/ |
| Obor | Informatika |
| Klíčová slova | text segmentation; LSI; latent semantic indexing |
| Popis | In this paper we propose features desirable of linear text segmentation algorithms for the Information Retrieval domain, with emphasis on improving high similarity search of heterogeneous texts. We proceed to describe a robust purely statistical method, based on context overlap exploitation, that exhibits these desired features. Ways to automatically determine its internal parameter of latent space dimensionality are discussed and evaluated on a data set. |
| Související projekty: |