$k$-PCA for (non-squared) Euclidean Distances: Deterministic Polynomial Time Approximation

Published: 03 Feb 2026, Last Modified: 06 Feb 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Given an integer $k\geq1$ and a set $P$ of $n$ points in $\mathbb{R}^d$, the classic $k$-PCA (Principal Component Analysis) approximates the affine \emph{$k$-subspace mean} of $P$, which is the $k$-dimensional affine linear subspace that minimizes its sum of squared Euclidean distances ($\ell_{2,2}$-norm) over the points of $P$, i.e., the mean of these distances. The \emph{$k$-subspace median} is the subspace that minimizes its sum of (non-squared) Euclidean distances ($\ell_{2,1}$-mixed norm), i.e., their median. The median subspace is usually more sparse and robust to noise/outliers than the mean, but also much harder to approximate since, unlike the $\ell_{z,z}$ (non-mixed) norms, it is non-convex for $k<d-1$. We provide the first polynomial-time deterministic algorithm whose both running time and approximation factor are not exponential in $k$. More precisely, the multiplicative approximation factor is $\sqrt{d}$, and the running time is polynomial in the size of the input. We expect that our technique would be useful for many other related problems, such as $\ell_{2,z}$ norm of distances for $z\not \in \{1,2\}$, e.g., $z=\infty$, and handling outliers/sparsity. Open code and experimental results on real-world datasets are also provided.
Submission Number: 333
Loading