Abstract: In the paper, we study the sparse $\{0,\pm1\}$-matrix based random projection, which has been widely applied in classification to reduce the data dimension. For such kind of sparse matrices, it is of computational interest to explore the minimum number of nonzero entries $\pm1$ that supports achieving the best or nearly best classification performance. To achieve this, we analyze the impact of matrix sparsity on the $\ell_1$ distance between projected data points. The analysis is inspired by the principle component analysis, which says that the larger distance between projected data points should better capture the variation among original data, and then yield better classification performance. Theoretically, the $\ell_1$ distance between projected data points is not only related to the sparsity of sparse matrices, but also to the distribution of original data. Without loss of generality, we consider two typical data distributions, the Gaussian mixture distribution and the two-point distribution, which have been widely used to model the distributions of real data. With the two data distributions, we estimate the $\ell_1$ distance between projected data points. It is found that the sparse matrices with only one or at most dozens of nonzero entries per row, can provide comparable or even larger $\ell_1$ distances than other more dense matrices, under the matrix size $m\geq\mathcal{O}(\sqrt{n})$. Accordingly, the similar performance trend should also be obtained in classification. This is confirmed with classification experiments on real data of different types, including the image, text, gene and binary quantization data.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=Z5j4ydrDRy&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: Dear Dr. Sprechmann,
We would like to express our sincere gratitude to you and the previous AE and reviewers for taking the time to review our manuscript.
**This is a resubmission (encouraged by the AE) of the previously rejected manuscript TMLR 2288. For the convenience of review, if possible, please help to assign the manuscript to the previous anonymous AE and reviewers.**
**The major contribution of the work has been acknowledged by the previous AE and reviewers**, in accordance to the concluding remarks of the AE: "All the reviewers have given yes to both the claims and evidence questions. Similarly, all the reviewers believe that the work itself is interesting. "
**The primary deficiency of the previous manuscript**, as commented by the AE, is that our major claim is derived by the numerical analysis (P3-P6) on the theoretical results of Theorems 1 and 2, rather than by *direct proofs*. So the AE kindly suggested us to "tone down such sentences or have more rigorous arguments to support those claims".
Following the suggestions, in the revised manuscript **we have addressed the problem by providing more rigorous arguments (Eqs. (7) and (13) in Theorems 1 and 2) to support our major claim:** the sparse matrices with a few nonzero entries per row can provide comparable classification performance with the other more dense matrices.
The new additions to the manuscript (highlighted in red) mainly include Eqs. (7) (8) (13) (14) and the discussions on them, which are briefly introduced as follows.
1) In Theorems 1 and 2, we have further analyzed the convergence error of the expected $\ell_1$ distance $\mathbf{E}(|r^\top x|)$ with finite matrix sparsity $k$, as detailed in Eqs. (7) and (13).
2) In the Remarks of Theorems 1 and 2, by Eqs. (7) and (13), we have further derived the lower bound of $k$ that ensures the convergence error ratio upper-bounded by any given positive constant $\eta$, as shown in Eqs.(8) and (14). Within the bound of $k$ derived for small $\eta$, $\mathbf{E}|r^{\top}x|$ should take similar values for different $k$,and accordingly, the different $k$ should also yield similar classification performance. Then we reach *the conclusion* that the sparse matrices with small $k$ (taking the values around its lower bound), will provide comparable classification performance with the other more dense matrices with larger $k$.
3) Considering the smaller the lower bound of $k$ (derived under smaller error ratios), the sparser matrix structure we can obtain. In Fig.2, we have examined the value of the lower bound of $k$ that ensures obtaining small error ratios, by computing the convergence error ratios for different $k$. It is found that small error ratios (close to zero) can be approached with small $k$, such as $k\geq 20$, corresponding to sparse matrix structures.
Assigned Action Editor: ~Bamdev_Mishra1
Submission Number: 3167
Loading