Partition First, Embed Later: Laplacian-Based Feature Partitioning for Refined Embedding and Visualization of High-Dimensional Data
Keywords: data visualization, dimensionality reduction, manifold learning, data embedding, feature partitioning
TL;DR: We present a feature partitioning approach for embedding and visualizing multiple low-dimensional structures within high-dimensional data
Abstract: The utility of embedding and visualization techniques for high-dimensional data in exploratory analysis is well-established. However, when the data embody intricate structures governed by multiple latent variables, standard techniques may distort or even mask part of the phenomenon under study. This paper explores scenarios where the observed features can be partitioned into mutually exclusive subsets, each capturing a different smooth substructure. In such cases, visualizing the data based on each feature partition can better characterize the underlying processes and structures in the data, leading to improved interpretability. To partition the features, we propose solving an optimization problem that promotes a graph Laplacian-based smoothness in each partition, thus prioritizing partitions with simpler geometric structures. Our approach generalizes traditional embedding and visualization techniques such as t-distributed Stochastic Neighbor Embedding and Diffusion Maps, allowing them to learn multiple embeddings simultaneously. We establish that if several independent or partially dependent manifolds are embedded in distinct feature subsets in high-dimensional space, then our framework can reliably identify the correct subsets with theoretical guarantees. Finally, we demonstrate the effectiveness of our approach in extracting multiple low-dimensional structures and partially independent processes from both simulated and real data.
Supplementary Material: pdf
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11090
Loading