Large-Scale Subspace Clustering by Independent Distributed and Parallel Coding

13 Jun 2021OpenReview Archive Direct UploadReaders: Everyone
Abstract: Subspace clustering is a popular method to discover underlying low-dimensional structures of high-dimensional multimedia data (e.g., images, videos, and texts). In this article, we consider a large-scale subspace clustering (LS²C) problem, that is, partitioning million data points with a millon dimensions. To address this, we explore an independent distributed and parallel framework by dividing big data/variable matrices and regularization by both columns and rows. Specifically, LS²C is independently decomposed into many subproblems by distributing those matrices into different machines by columns since the regularization of the code matrix is equal to a sum of that of its submatrices (e.g., square-of-Frobenius/ℓ₁-norm). Consensus optimization is designed to solve these subproblems in a parallel way for saving communication costs. Moreover, we provide theoretical guarantees that LS²C can recover consensus subspace representations of high-dimensional data points under broad conditions. Compared with the state-of-the-art LS²C methods, our approach achieves better clustering results in public datasets, including a million images and videos.
0 Replies

Loading