One cluster or two? A Manifold-Based Approach

ICLR 2026 Conference Submission20562 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Clustering, Manifold Inference, Nearest-Neighbor Graphs
Abstract: The manifold hypothesis suggests a natural criterion for clustering: partition data according to the manifold component from which they are drawn. This criterion is useful because, intuitively, the separability of manifold components is governed by the ambient separation between components relative to the largest gap in the sample’s coverage. The analysis integrates topology (e.g., manifold volume and reach) with estimation (e.g., fill radius and sample density). Formally it identifies a criticality: when a threshold is exceeded, nearest-neighbor data graphs avoid bridging edges and clusters are preserved; otherwise, bridges appear and components fuse. Practically, criticality is sandwiched between bounds that imply a measure of cluster confidence, and motivates an algorithm-Manifold-Based Clustering (MBC)-that constructs a candidate neighborhood graph. MBC is parameter-light and, unlike density-based methods (e.g., HDBSCAN), avoids hand-tuned scale thresholds. Instead, MBC yields a monotone bracket on the number of components by a natural sweep of neighborhood size. Across curved and high-dimensional benchmarks, MBC matches state-of-the-art accuracy and exposes ambiguity near the critical thresholds.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 20562
Loading