Coresets for $k$-mean clustering of segments

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Clustering; $k$-means; Segment clustering; Non-convex optimisation; Coresets
Abstract: The $k$-means of a given set $\mathcal{S}\subseteq \mathbb{R}^d$ of $n$ segments is a set $X\subseteq \mathbb{R}^d$ of $|X|=k$ centers which minimizes their sum of squared distances $D(\mathcal{S},X):=\sum_{S\in \mathcal{S}}\min_{x\in X}D(S,x)$. Here, the distance $D(S,x)$ between a segment $S$ and a point $x$ is the integral of its distances $\int_{s\in S}\|p-x\|$ over each point on the segment. More generally, the farthest $m$ input points (outliers) may be ignored, other distance functions may be used, such as M-estimator or non-squared, and each distance may be multiplied by a function that depends on the size of its cluster, say, to obtain balanced clustering. For a given $\varepsilon>0$, an $\varepsilon$-coreset $C\subseteq S$ for all these problems is a weighted subset $C\subset S$, that approximates $D(S,X)$ up to $1\pm\varepsilon$ multiplicative factor for every set $X\subseteq\mathbb{R}^d$ of (possibly weighted) $k$ centers. Such a coreset enables handling streaming, big, distributed input in parallel using existing techniques. We suggest the first coreset construction that, with high probability, returns an $\varepsilon$-coreset $C$ for \emph{any} input set $\mathcal{S}$ of segments. For constant $k,\varepsilon$, the size of the coreset is $|C|\in O \big(\log^2(n)\big)$ and is computed in time $O(nd)$. Experimental results and real-time video tracking application demonstrate the applicability of our algorithm, the latter demonstrates that our method supports vectorized segments.
Primary Area: optimization
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 13883
Loading