MCbiF: Measuring Topological Autocorrelation in Multiscale Clusterings via 2-Parameter Persistent Homology
Keywords: topological data analysis, multiparameter persistent homology, multiscale clustering, non-hierarchical clustering, Sankey diagrams
Abstract: Datasets in applications often possess an intrinsic multiscale structure with meaningful descriptions at different levels of coarseness. Such datasets are naturally described as multi-resolution clusterings, i.e., not necessarily hierarchical sequences of partitions across scales. To analyse and compare such sequences, we use tools from topological data analysis and define the Multiscale Clustering Bifiltration (MCbiF), a 2-parameter filtration of abstract simplicial complexes that encodes cluster intersection patterns across scales. The MCbiF can be interpreted as a higher-order extension of Sankey diagrams and reduces to a dendrogram for hierarchical sequences. We show that the multiparameter persistent homology (MPH) of the MCbiF yields a finitely presented and block decomposable module, and its stable Hilbert functions characterise the topological autocorrelation of the sequence of partitions, providing topological feature maps for downstream machine learning tasks. In particular, at dimension zero, the MPH captures the degree of nestedness of the sequence, whereas at dimension one, it captures higher-order inconsistencies between clusters across scales. We demonstrate in our experiments that the MCbiF feature maps outperform information-based baseline features on both regression and classification tasks on non-hierarchical sequences of partitions. We also show an application of MCbiF to real-world data to measure non-hierarchies in wild mice social grouping patterns across time.
Primary Area: learning on graphs and other geometries & topologies
Submission Number: 19421
Loading