Keywords: Dimensionality reduction, manifold learning, representation learning, nonlinear embeddings, feature extraction, t-SNE, UMAP, autoencoders, high-dimensional data, visualization
Abstract: The efficient processing of increasingly feature-dense data has become a critical area of research in both industry and academia. Applications such as data visualization, embedded neural network inference, and the reduction of computational complexity are founded upon the ability to project data into lower dimensions with minimal neighborhood distortion. Current statistical algorithms such as UMAP, t-SNE, and LLE utilize assumptions regarding data distribution and non-trivial hyperparameter tuning. Consequently, out-of-sample projections are non-deterministic and the corresponding reduced axis are non-interpretable. In this paper we propose the framework of Geometric Dimensionality Reduction, a novel technique utilizing algebraic and topological symmetries to semi-reversibly \& logarithmically compress data. Preconditioning metrics for reducing projection distortion, improving clustering accuracy, and data dimension reversibility are proposed as well. Additionally, we demonstrate a closed-form projection for $\mathbb{R}^4\mapsto\mathbb{R}^3$ and compare results with UMAP and t-SNE. Finally, we discuss open challenges and provide a complete framework for $\mathbb{R}^{2^n}\mapsto\mathbb{R}^{n+1}$ dimensionality reduction.
Primary Area: optimization
Submission Number: 21515
Loading