Abstract: In the paper, we study the sparse $\{0,\pm1\}$-matrix based random projection, which has been widely applied in classification to reduce data dimension. For the problem, it is interesting to estimate the optimal sparsity of sparse matrices for classification, namely the minimum number of nonzero entries $\pm1$ that supports achieving the best classification performance. To achieve this, we analyze the impact of matrix sparsity on the $\ell_1$ distance between projected data points. By principle component analysis, it is known that the larger distance between projected data points should better capture the variation among original data, and then yield better classification performance. Theoretically, the $\ell_1$ distance between projected data points is not only related to the sparsity of sparse matrices, but also to the distribution of original data. Without loss of generality, we evaluate two typical data distributions, the Gaussian mixture distribution and the two-point distribution, which have been widely used to model the distributions of real data. Given the two data distributions, it is proved that the maximum $\ell_1$ distance between projected data points could be approximated, as the sparse matrix contains only one or at most about twenty nonzero entries per row, under the size $m\geq\mathcal{O}(\sqrt{n})$. Accordingly, the best classification performance should also be achieved under such conditions. This is confirmed with extensive experiments on different types of data, including the image, text, gene and binary quantization data.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: The supplement material has been slightly revised for the convenience of reading.
Assigned Action Editor: ~Bamdev_Mishra1
Submission Number: 2288
Loading