Abstract: In recent years, the concept of safe semi-supervised clustering (S3C) has received increasing attention within the semi-supervised learning community. Generally, existing S3C methods first analyze the risk of labeled instances and then try to mitigate the corresponding negative impacts through various risk-based regularization approaches. However, the adverse effects of high-probability mislabeled instances (HPMIs) are not eliminated, and corresponding useful discriminative information is not discovered effectively. To address these issues, we propose an improved S3C method based on capped ℓ21 norm, called CapS3FCM. The motivation is that the capped ℓ21 norm can effectively filter or find mislabeled instances. Consequently, CapS3FCM introduces two capped ℓ21 norms. The first norm aims to make use of label information while simultaneously alleviating negative influences of mislabeled instances, especially HPMIs. The second norm further aims to discover useful discriminative information of those HPMIs. Finally, a loss function based on the capped ℓ21 norms is built, and the optimization problem is solved using an efficient iterative optimization strategy. To verify the effectiveness of CapS3FCM, a series of experiments is carried out on several datasets, which demonstrate that CapS3FCM can outperform the other semi-supervised and S3C methods. These findings validate that the capped ℓ21 norm is both practical and effective.
External IDs:dblp:journals/fss/Gan00Y025
Loading