On the evaluation and optimization of LabeledPAM
Autoři | |
---|---|
Rok publikování | 2025 |
Druh | Článek v odborném periodiku |
Časopis / Zdroj | Information Systems |
Fakulta / Pracoviště MU | |
Citace | |
www | https://www.sciencedirect.com/science/article/pii/S030643792500064X |
Doi | https://doi.org/10.1016/j.is.2025.102580 |
Klíčová slova | semi-supervised clustering; k-medoids; partitioning around medoids; FasterPAM; semi-supervised classification |
Popis | The analysis of complex and weakly labeled data is increasingly popular. Traditional unsupervised clustering aims to uncover interrelated sets of objects based on feature-based similarity. This approach often reaches its limits when dealing with complex multimedia data due to the curse of dimensionality, presenting unique challenges. Semi-supervised clustering, which leverages small amounts of labeled data, has the potential to cope with this problem. In this work, we delve into LabeledPAM, a semi-supervised clustering method, which extends FasterPAM, a state-of-the-art ??-medoids clustering algorithm. Our algorithm is designed for both semi-supervised classification, where labels are assigned to clusters with minimal labeled data, and semi-supervised clustering, where new clusters with unknown labels are identified. We propose an optimization to the original LabeledPAM algorithm that reduces its computational complexity. Additionally, we provide an implementation in Rust, which integrates seamlessly with Python libraries. To assess LabeledPAM’s performance, we empirically evaluate its properties by comparing it against a range of semi-supervised clustering algorithms, including density-based ones. We conduct experiments on a collection of real-world datasets. Our results demonstrate that LabeledPAM achieves competitive clustering quality while maintaining efficiency across various scenarios, showing its versatility for real-world applications. |
Související projekty: |