Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein

TMLR Paper4144 Authors

05 Feb 2025 (modified: 07 Feb 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. Traditionally, this involves using dimensionality reduction (DR) methods to project data onto lower-dimensional spaces or organizing points into meaningful clusters (clustering). In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem. This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem. We empirically demonstrate its relevance to the identification of low-dimensional prototypes representing data at different scales, across multiple image and genomic datasets.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Ehsan_Amid1
Submission Number: 4144
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview