Abstract: While $K$-means is known to be a standard clustering algorithm, its performance may be compromised due to the presence of outliers and high-dimensional noisy variables. This paper proposes adaptively robust and sparse $K$-means clustering (ARSK) to address these practical limitations of the standard $K$-means algorithm. For robustness, we introduce a redundant error component for each observation, and this additional parameter is penalized using a group sparse penalty. To accommodate the impact of high-dimensional noisy variables, the objective function is modified by incorporating weights and implementing a penalty to control the sparsity of the weight vector. The tuning parameters to control the robustness and sparsity are selected by $\rm Gap$ statistics.Through simulation experiments and real data analysis, we demonstrate the proposed method's superiority to existing algorithms in identifying clusters without outliers and informative variables simultaneously.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=6VGiwTmCem&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: Change non-anonymous GitHub to anonymous GitHub and anonymize all content unrelated to the paper.
Code: https://github.com/lee1995hao/ARSK
Assigned Action Editor: ~Bruno_Loureiro1
Submission Number: 3073
Loading