An Efficient Protocol for Distributed Column Subset Selection in the Entrywise $\ell_p$ Norm

Shuli Jiang; Dongyu Li; Irene Mengze Li; Arvind V. Mahankali; David Woodruff

An Efficient Protocol for Distributed Column Subset Selection in the Entrywise $\ell_p$ Norm

Shuli Jiang, Dongyu Li, Irene Mengze Li, Arvind V. Mahankali, David Woodruff

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Column Subset Selection, Distributed Learning

Abstract: We give a distributed protocol with nearly-optimal communication and number of rounds for Column Subset Selection with respect to the entrywise {$\ell_1$} norm ($k$-CSS$_1$), and more generally, for the $\ell_p$-norm with $1 \leq p < 2$. We study matrix factorization in $\ell_1$-norm loss, rather than the more standard Frobenius norm loss, because the $\ell_1$ norm is more robust to noise, which is observed to lead to improved performance in a wide range of computer vision and robotics problems. In the distributed setting, we consider $s$ servers in the standard coordinator model of communication, where the columns of the input matrix $A \in \mathbb{R}^{d \times n}$ ($n \gg d$) are distributed across the $s$ servers. We give a protocol in this model with $\widetilde{O}(sdk)$ communication, $1$ round, and polynomial running time, and which achieves a multiplicative $k^{\frac{1}{p} - \frac{1}{2}}\poly(\log nd)$-approximation to the best possible column subset. A key ingredient in our proof is the reduction to the $\ell_{p,2}$-norm, which corresponds to the $p$-norm of the vector of Euclidean norms of each of the columns of $A$. This enables us to use strong coreset constructions for Euclidean norms, which previously had not been used in this context. This naturally also allows us to implement our algorithm in the popular streaming model of computation. We further propose a greedy algorithm for selecting columns, which can be used by the coordinator, and show the first provable guarantees for a greedy algorithm for the $\ell_{1,2}$ norm. Finally, we implement our protocol and give significant practical advantages on real-world data analysis tasks.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Reviewed Version (pdf): https://openreview.net/references/pdf?id=T2G-LHJvDA

8 Replies

Loading