Cover learning for large-scale topology representation

Luis Scoccola; Uzu Lim; Heather A. Harrington

Cover learning for large-scale topology representation

Luis Scoccola, Uzu Lim, Heather A. Harrington

Published: 01 May 2025, Last Modified: 13 Aug 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Classical unsupervised learning methods like clustering and linear dimensionality reduction parametrize large-scale geometry when it is discrete or linear, while more modern methods from manifold learning find low dimensional representation or infer local geometry by constructing a graph on the input data. More recently, topological data analysis popularized the use of simplicial complexes to represent data topology with two main methodologies: topological inference with geometric complexes and large-scale topology representation with Mapper graphs -- central to these is the nerve construction from topology, which builds a simplicial complex given any cover of a space by subsets. While successful, these have limitations: geometric complexes scale poorly with data size, and Mapper graphs can be hard to tune and only contain low dimensional information. In this paper, we propose to study the problem of learning covers in its own right, and from the perspective of optimization. We describe a method to learn topologically-faithful covers of geometric datasets, and show that the simplicial complexes thus obtained can outperform standard topological inference approaches in terms of size, and Mapper-type algorithms in terms of representation of large-scale topology.

Lay Summary: We propose the cover learning problem as a general unsupervised learning problem. Given an input geometric data set (such as a point cloud), the goal is to produce a cover of the data (a set of subsets whose union is the entire dataset), which encodes the large-scale topology of the data. Cover learning generalizes clustering, and has applications for both data visualization and topological inference (as in topological data analysis). We give a formal interpretation of the cover learning problem from the viewpoint of geometry and topology, and derive a principled loss function for cover learning in the idealized scenario where the data consists of a Riemannian manifold. We propose practical estimators for the terms in our loss function, in the case where the space is a weighted graph, and show that optimization is feasible using known optimization tools, including (graph) neural networks and topological optimization. We provide an implementation of ShapeDiscover, a cover learning algorithm based on our theory, and showcase it on two sets of experiments: a quantitative one on topological inference, and a qualitative one on large-scale topology visualization. In the first case, ShapeDiscover learns topologically correct simplicial complexes, on synthetic and real data, of smaller size than those obtained with previous topological inference approaches. In the second, ShapeDiscover represents the large-scale topology of real data better, and with more intuitive parameters, than previous TDA algorithms that fit the cover learning framework.

Link To Code: https://github.com/LuisScoccola/shapediscover

Primary Area: General Machine Learning->Unsupervised and Semi-supervised Learning

Keywords: topological data analysis, cover, simplicial complex, topological inference, mapper, manifold learning

Submission Number: 11665

Loading