Asymptotic Optimality of Self-Representative Low-Rank Approximation and Its Applications

Saeed Vahidian; mohsen Joneidi; Ashkan Esmaeili; Siavash Khodadadeh; Sharare zehtabian; Ladislau Boloni; Nazanin Rahnavard; Bill Lin; Mubarak Shah

Asymptotic Optimality of Self-Representative Low-Rank Approximation and Its Applications

Saeed Vahidian, mohsen Joneidi, Ashkan Esmaeili, Siavash Khodadadeh, Sharare zehtabian, Ladislau Boloni, Nazanin Rahnavard, Bill Lin, Mubarak Shah

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Keywords: data selection, low rank approximation, column subset selection

Abstract: We propose a novel technique for sampling representatives from a large, unsupervised dataset. The approach is based on the concept of {\em self-rank}, defined as the minimum number of samples needed to reconstruct all samples with an accuracy proportional to the rank-$K$ approximation. As the exact computation of self-rank requires a computationally expensive combinatorial search, we propose an efficient algorithm that jointly estimates self-rank and selects the optimal samples with high accuracy. A theoretical upper bound is derived that reaches the tightest bound for two asymptotic cases. The best approximation ratio for self-representative low-rank approximation was presented in ICML 2017~\cite{Chierichetti-icml-2017}, which was further improved by the bound $\sqrt{1+K}$ reported in~NeurIPS 2019~\cite{dan2019optimal}. Both of these bounds depend solely on the number of selected samples. In this paper, for the first time, we present an adaptive approximation ratio depending on spectral properties of the original dataset, $\small{\boldsymbol{A}\in \mathbb{R}^{N\times M}}$. In particular, our performance bound is proportional to the condition number $\kappa(\boldsymbol{A})$. Our derived approximation ratio is expressed as $1+(\kappa(\boldsymbol{A})^2-1)/(N-K)$ which approaches $1$ in two asymptotic cases. In addition to evaluating the proposed algorithm on a synthetic dataset, we show that the proposed sampling scheme can be utilized in real-world applications such as graph node sampling for optimizing the shortest path criterion, and learning a classifier with sampled data.

One-sentence Summary: We propose a novel technique for sampling representatives from a large, unsupervised dataset.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Reviewed Version (pdf): https://openreview.net/references/pdf?id=eSM5viKVO

5 Replies

Loading