On the evaluation and optimization of LabeledPAM

Varování

Publikace nespadá pod Ústav výpočetní techniky, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu muni.cz.
Autoři

JÁNOŠOVÁ Miriama LANG Andreas BUDÍKOVÁ Petra SCHUBERT Erich DOHNAL Vlastislav

Rok publikování 2025
Druh Článek v odborném periodiku
Časopis / Zdroj Information Systems
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
www https://www.sciencedirect.com/science/article/pii/S030643792500064X
Doi https://doi.org/10.1016/j.is.2025.102580
Klíčová slova semi-supervised clustering; k-medoids; partitioning around medoids; FasterPAM; semi-supervised classification
Popis The analysis of complex and weakly labeled data is increasingly popular. Traditional unsupervised clustering aims to uncover interrelated sets of objects based on feature-based similarity. This approach often reaches its limits when dealing with complex multimedia data due to the curse of dimensionality, presenting unique challenges. Semi-supervised clustering, which leverages small amounts of labeled data, has the potential to cope with this problem. In this work, we delve into LabeledPAM, a semi-supervised clustering method, which extends FasterPAM, a state-of-the-art ??-medoids clustering algorithm. Our algorithm is designed for both semi-supervised classification, where labels are assigned to clusters with minimal labeled data, and semi-supervised clustering, where new clusters with unknown labels are identified. We propose an optimization to the original LabeledPAM algorithm that reduces its computational complexity. Additionally, we provide an implementation in Rust, which integrates seamlessly with Python libraries. To assess LabeledPAM’s performance, we empirically evaluate its properties by comparing it against a range of semi-supervised clustering algorithms, including density-based ones. We conduct experiments on a collection of real-world datasets. Our results demonstrate that LabeledPAM achieves competitive clustering quality while maintaining efficiency across various scenarios, showing its versatility for real-world applications.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.

Další info