Keywords: random projection, sparse matrices, dimensionality reduction, mean absolute deviation, classification
Abstract: In the paper, we study the sparse $\\{0,\pm1\\}$-matrix-based random projection, a technique extensively applied in diverse classification tasks for dimensionality reduction and as a foundational model for each layer in the popular deep ternary networks. For these sparse matrices, determining the optimal sparsity level, namely the minimum number of nonzero entries $\pm1$ needed to achieve the optimal or near-optimal classification performance, remains an unresolved challenge. To investigate the impact of matrix sparsity on classification, we propose to analyze the mean absolute deviation (MAD) of projected data points, which quantifies their dispersion. Statistically, a higher degree of dispersion is expected to improve classification performance by capturing more intrinsic variations in the original data. Given that the MAD value depends not only on the sparsity level of random matrices but also on the distribution of the original data, we evaluate two representative data distributions for generality: the Gaussian mixture distribution, widely used to model complex real-world data; and the two-point distribution, available for modeling discretized data. Our analysis reveals that sparse matrices with only \textit{one} or \textit{a few} nonzero entries per row can achieve MAD values comparable to, or even exceed, those of denser matrices, provided the matrix size satisfies $m\geq\mathcal{O}(\sqrt{n})$, where $m$ and $n$ denote the projected and original dimensions, respectively. These extremely sparse matrix structures imply significant computational savings. This finding is further validated through classification experiments on diverse real-world datasets, including images, text, gene data, and binary-quantized data, demonstrating its broad applicability.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 15370
Loading