Selecting Interesting Articles Using Their Similarity Based Only on Positive Examples

Hroza,  Jiří; Žižka,  Jan

Selecting Interesting Articles Using Their Similarity Based Only on Positive Examples

Warning

This publication doesn't include Institute of Computer Science. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	HROZA Jiří ŽIŽKA Jan
Year of publication	2005
Type	Article in Proceedings
Conference	Computational linguistics and Intelligent Text Processing
MU Faculty or unit	Faculty of Informatics
Citation
Field	Informatics
Keywords	machine learning; text categorization; text filtration; text similarity; k-NN; ranking
Description	The task of automated searching for interesting text documents frequently suffers from a~very poor balance among documents representing both positive and negative examples or from one completely missing class. This paper suggests the ranking approach based on the k-NN algorithm adapted for determining the similarity degree of new documents just to the representative positive collection. From the viewpoint of the precision-recall relation, a~user can decide in advance how many and how similar articles should be released through a filter.
Related projects:	Human-computer interaction, dialog systems and assistive technologies