Keywords: Clustering;learning-augmented algorithm;
Abstract: The line-based clustering is a natural extension of classical $k$-clustering problem with considerable applications in computer vision, missing data analysis, and related areas. Despite its practical importance, the unbounded nature of lines and the failure of the triangle inequality for point-to-line distances undermine the structural properties required for theoretical analysis, thereby causing that the theoretical foundation of line-based clustering remains far less developed than that of point-based clustering. In this paper, we study the $k$-median of lines problem, and address these challenges within the learning-augmented paradigm that leverages given auxiliary information in form of predicted labels to guide clustering. Specifically, we propose a new learning-augmented algorithm for the $k$-median of lines problem, in which auxiliary label is exploit to guide the sampling process, and the anchor set induced by line pairs is proposed to guarantees the inclusion of high-quality representative centers. Moreover, we theoretically prove that our proposed algorithm achieves a $(1+O(\alpha))$-approximation in time $O\left(\tfrac{9^d}{(1-2\alpha)^2}n\log n\ln \tfrac{k}{\theta}\right)$ for the $k$-median of lines problem. In particular, in the low-dimensional Euclidean space, our algorithm can obtain a $(1+O(\alpha))$-approximation with near-linear time complexity in the input size. Experimental results demonstrate that our algorithm consistently outperforms existing approaches in both solution quality and computational efficiency.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 6640
Loading