- Keywords: representation learning, robust subspace recovery, dual principals component pursuit, outliers, model selection
- Abstract: Robust subspace recovery (RSR) is a fundamental problem in robust representation learning. Although RSR has received considerable attention in the literature, there are still several aspects of RSR that remain highly unexplored. In particular, when the dimension of the underlying subspace is unknown and the data contains significant numbers of outlying entries many methods can struggle to identify the correct subspace. Here we focus on a recently proposed RSR method termed Dual Principal Component Pursuit (DPCP) approach, which aims to recover a basis of the orthogonal complement of the subspace. While prior work has shown that DPCP can provably recover the correct subspace in the presence of outliers, this relies on knowing the true dimension of the subspace, which is typically not possible in practice, and DPCP often fails when this dimension is unknown. Instead, we propose a very simple algorithm based on running multiple instances of a projected sub-gradient descent method (PSGM), with each problem instance seeking to find one vector in the null space of the subspace. Here we show that under mild conditions that this approach will succeed with high probability. In particular, we show that 1) all of the problem instances will converge to a vector in the null space of the subspace and 2) the ensemble of problem instance solutions will be sufficiently diverse to fully span the null space of the subspace (and thus reveal the true codimension of the subspace) even when the true subspace dimension is unknown. We provide empirical results that corroborate our theoretical results and showcase the remarkable implicit rank regularization behavior of PSGM algorithm that allows us to perform RSR without being aware of the subspace dimension.
- One-sentence Summary: We study the robust subspace recovery problem when subspace codimension is unknown.