The Sparse Matrix-Based Random Projection: A Study of Binary and Ternary Quantization

Published: 18 Feb 2025, Last Modified: 19 Feb 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Random projection is a simple yet effective technique for dimension reduction, widely used in various machine learning tasks. Following the projection step, quantization is often applied to further reduce the complexity of projected data. In general, quantized projections are expected to approximately preserve the pairwise distances between the original data points, to avoid significant performance degradation in subsequent tasks. While this distance preservation property has been investigated for Gaussian matrices, our work further extends the analysis to hardware-friendly $\{0,1\}$-binary matrices, particularly focusing on cases where the projections are quantized into two types of low bit-width codes: $\{0,1\}$-binary codes and $\{0,\pm1\}$-ternary codes. It is found that the distance preservation property tends to be better maintained, when the binary projection matrices exhibit sparse structures. This is validated through classification and clustering experiments, where extremely sparse binary matrices, with only one nonzero entry per column, achieve superior or comparable performance to other denser binary matrices and Gaussian matrices. This presents an opportunity to significantly reduce the computational and storage complexity of the quantized random projection model, without compromising, and potentially even improving its performance.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: In this camera-ready revision, we have incorporated a new author, Zhongzheng Li (lizhongzheng@mail.sdu.edu.cn), with the editors' consent. Furthermore, we have also included an Acknowledgements section.
Code: https://anonymous.4open.science/r/The-Sparse-Matrix-Based-Random-Projection-D13D
Assigned Action Editor: ~Jeff_Phillips1
Submission Number: 3480
Loading