Balancing Information Preservation and Computational Efficiency: L2 Normalization and Geodesic Distance in Manifold Learning

Ziqi Rong; Jinpu Cai; Jiahao Qiu; Pengcheng Xu; Lana Garmire; Qiuyu Lian; Hongyi Xin

Balancing Information Preservation and Computational Efficiency: L2 Normalization and Geodesic Distance in Manifold Learning

Ziqi Rong, Jinpu Cai, Jiahao Qiu, Pengcheng Xu, Lana Garmire, Qiuyu Lian, Hongyi Xin

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: visualization or interpretation of learned representations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Normalization, Geodesic Distance, Manifold Learning, Bioinformatics

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Distinguishable metric of similarity plays a fundamental role in unsupervised learning, particularly in manifold learning and high-dimensional data visualization tasks, by which differentiate between observations without labels. However, conventional metrics like Euclidean distance after L1-normalization may fail by losing distinguishable information when handling high-dimensional data, where the distance between different observations gradually converges to a shrinking interval. In this article, we discuss the influence of normalization by different p-norms and the defect of Euclidean distance. We discover that observation differences are better preserved when normalizing data by a higher p-norm and using geodesic distance rather than Euclidean distance as the similarity measurement. We further identify that L2-normalization onto the hypersphere is often sufficient in preserving delicate differences even in relatively high dimensional data while maintaining computational efficiency. Subsequently, we present HS-SNE (HyperSphere-SNE), a hypersphere-representation-system-based augmentation to t-SNE, which effectively addresses the intricacy of high-dimensional data visualization and similarity measurement. Our results show that this hypersphere representation system has improved resolution to identify more subtle differences in high-dimensional data, while balancing information preservation and computational efficiency.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7472

Loading