Keywords: dimensionality reduction, data embedding, manifold learning
Abstract: Dimensionality reduction (DR) and visualization of high-dimensional data is of theoretical and practical value in machine learning and related fields. In theory, there exists an intriguing, non-intuitive discrepancy between the geometry of high-dimensional space and low-dimensional space. Based on this discrepancy, we propose a novel DR and visualization method called Space-based Manifold Approximation and Projection (SpaceMAP). Our method establishes a quantitative space transformation to address the ``crowding problem" in DR; with the proposed equivalent extended distance (EED) and function distortion (FD) theory, we are able to match the capacity of high-dimensional and low-dimensional space, in a principled manner. To handle complex high-dimensional data with different manifold properties, SpaceMAP makes distinctions between the near field, middle field, and far field of data distribution in a data-specific, hierarchical manner. We evaluated SpaceMAP on a range of artificial and real datasets with different manifold properties, and demonstrated its excellent performance in comparison with classical and state-of-the-art DR methods. In addition, the concept of space distortion provides a generic framework for understanding nonlinear DR methods such as t-distributed Stochastic Neighbor Embedding (tSNE) and Uniform Manifold Approximation and Projection (UMAP).
One-sentence Summary: We propose a novel dimensionality reduction (DR) method using space geometry and hierarchical manifold approximation.
16 Replies
Loading