['1c1', '< Title: TOPOLOGICAL BLINDSPOTS: UNDERSTANDING AND EXTENDING TOPOLOGICAL DEEP LEARNING THROUGH THE LENS OF EXPRESSIVITY', '---', '> Title: Bridging Topological Blindspots: Enhancing Expressivity in Topological Deep Learning with Multi-Cellular Networks', '3c3', "< Abstract: Topological deep learning (TDL) is a rapidly growing field that seeks to leverage topological structure in data and facilitate learning from data supported on topological objects, ranging from molecules to 3D shapes. Most TDL architectures can be unified under the framework of higher-order message-passing (HOMP), which generalizes graph message-passing to higher-order domains. In the first part of the paper, we explore HOMP's expressive power from a topological perspective, demonstrating the framework's inability to capture fundamental topological and metric invariants such as diameter, orientability, planarity, and homology. In addition, we demonstrate HOMP's limitations in fully leveraging lifting and pooling methods on graphs. To the best of our knowledge, this is the first work to study the expressivity of TDL from a topological perspective. In the second part of the paper, we develop two new classes of architecturesmulti-cellular networks (MCN) and scalable MCN (SMCN) -which draw inspiration from expressive GNNs. MCN can reach full expressivity, but scaling it to large data objects can be computationally expansive. Designed as a more scalable alternative, SMCN still mitigates many of HOMP's expressivity limitations. Finally, we create new benchmarks for evaluating models based on their ability to learn topological properties of complexes. We then evaluate SMCN on these benchmarks and on real-world graph datasets, demonstrating improvements over both HOMP baselines and expressive graph methods, highlighting the value of expressively leveraging topological information. Code and data are available at https://github.com/yoavgelberg/SMCN. * Equal contribution. 1 Generally, dr 1 ̸ = dr 2 , e.g. atoms (0-cells) and bonds (1-cells) might have a different number of features. 2 This is equivalent to setting MLP (t) i,r ≡ 0 for neighborhood functions Ni that are not associated with an incoming edge. Equation 3 in its full generality corresponds to a fully-connected tensor diagram. 3 X and X ′ are isomorphic if there exists a bijective mapping ϕ : X → X ′ which is both rank-preserving and inclusion-preserving, see Appendix E for a formal definition.", '---', "> Abstract: Topological Deep Learning (TDL) is an evolving field that harnesses topological data structures for machine learning, spanning applications from molecules to 3D models. Higher-Order Message Passing (HOMP) frameworks unify many TDL architectures, extending conventional graph message-passing to higher-order domains. However, a comprehensive understanding of HOMP's expressive limitations, particularly its ability to capture fundamental topological and metric invariants like diameter, orientability, planarity, and homology, has been lacking. This paper provides the first topological expressivity analysis of TDL, revealing significant blindspots and limitations in leveraging lifting and pooling methods. To address these, we introduce two novel classes of architectures: Multi-Cellular Networks (MCN) and Scalable MCN (SMCN). MCN achieves full expressivity, provably distinguishing non-isomorphic topological objects, while SMCN offers a scalable alternative that effectively mitigates many of HOMP's expressivity shortcomings. We further establish new benchmarks specifically designed to evaluate models' abilities to learn topological properties. Empirical evaluations on these new benchmarks and real-world graph datasets demonstrate SMCN's superior performance over both HOMP baselines and expressive graph methods, underscoring the value of explicitly leveraging rich topological information. Code and data are available at https://github.com/yoavgelberg/SMCN.", '6,12c6,21', '< Topological Deep Learning (TDL) is an emerging field focused on learning from data supported on topological objects. Higher-order message-passing (HOMP) (Hajij et al., 2022a;b) has emerged as a key framework in TDL, unifying architectures designed for various topological data types. Originally introduced for simplicial complexes (Bodnar et al., 2021b), HOMP has been successively adapted for cellular complexes (Bodnar et al., 2021a;Hajij et al., 2020), and more recently, for combinatorial complexes (Hajij et al., 2022a;b). Each adaptation is a direct generalization of its predecessor. The HOMP framework extends traditional message-passing neural networks (MPNNs) (Gilmer et al., 2017), widely used in graph learning, to higher-order topological domains.', "< Despite their widespread adoption in various graph learning applications, MPNNs are known to struggle with expressivity limitations, often failing to distinguish even simple non-isomorphic graphs (Morris et al., 2019;Xu et al., 2018). This realization has led to a substantial body of work dedicated to developing more expressive graph architectures (Morris et al., 2023;Maron et al., 2019;Morris et al., 2019;Bevilacqua et al., 2021;Abboud et al., 2020;Bouritsas et al., 2022). Given the similarity between HOMP and MPNNs, a natural question arises: What are the limitations of higher-order message-passing architectures in distinguishing topological objects? This question, highlighted in a recent position paper (Papamarkou et al., 2024), is the main focus of this paper. We address this question from a topological perspective. First, we introduce a topological criterion designed to identify cases in which a pair of complexes is indistinguishable by HOMP. We then use this criterion to prove HOMP's inability to differentiate between complexes based on several fundamental topological and metric invariants, including diameter, orientability, planarity, and homology groups. These limitations are particularly noteworthy, as TDL's main goal is to leverage topological structure in data. In fact, several methods directly inject information closely related to some of the above properties into pre-existing framewroks (Horn et al., 2021;Chen et al., 2021;Rieck, 2023;Zhang et al., 2023c). Additionally, since many topological data objects are constructed by lifting graph data, we examine HOMP's limitations in expressively leveraging lifting and pooling methods to distinguish graphs.", "< In the second part of the paper, we introduce a new class of TDL architectures called multi-cellular networks (MCN) designed to address HOMP's expressivity limitations. MCN draws inspiration from higher-order graph architectures (Maron et al., 2019;Morris et al., 2019;Keriven & Peyré, 2019;Azizian & Lelarge, 2020), which successfully resolve expressivity limitations in MPNNs. MCN utilizes the equivariant linear layers introduced in Maron et al. ( 2018) and integrates them into the HOMP pipeline, resulting in architectures reminiscent of Invariant Graph Networks (IGNs) introduced in the same paper. We prove that MCN can reach full expressivity in distinguishing non-isomorphic complexes. Recognizing the scalability challenges of both IGNs and MCN, we propose an alternative called scalable MCN (SMCN). SMCN models apply expressive graph layers -often used as practical alternatives to IGNs -on graph structures defined over the cells of the complex. We prove that SMCN still mitigates many of HOMP's expressivity limitations.", "< We empirically evaluate SMCN on several real-world (lifted) graph benchmarks and find performance gains over both standard HOMP baselines and expressive GNNs, highlighting the value of expressively leveraging topological information. Additionally, we design three benchmarks to assess TDL architectures' ability to capture topological and metric information. The first, called the Torus Dataset, is a BREC-like (Wang & Zhang, 2024) dataset consisting of pairs of cellular complexes comprising one or more disjoint tori. Models are tasked with separating each pair in a statistically significant way. The two other benchmarks evaluate models based on their ability to predict topological properties of complexes obtained by lifting molecular graphs from ZINC (Sterling & Irwin, 2015).", "< Our contributions. Summarizing, the key contributions of this paper are as follows: (1) We provide a comprehensive analysis of HOMP's expressive power, evaluating its ability to capture topological and metric invariant and leverage lifting and pooling methods.", '< (2) We introduce multi-cellular networks (MCN), a novel class of TDL models, inspired by IGNs, which can provably reach full expressivity.', "< (3) We develop SMCN, a scalable version of MCN that addresses HOMP's expressivity limitations while maintaining computational efficiency. (4) We construct three benchmarks for assessing the topological expressivity of TDL architectures. (5) We empirically evaluate the performance of SMCN, demonstrating improvements over both standard TDL methods and expressive graph models, highlighting the benefits of expressively leveraging topological information.", '---', '> Topological Deep Learning (TDL) is a rapidly advancing field dedicated to extracting insights from data intrinsically structured on topological objects. Higher-Order Message Passing (HOMP) (Hajij et al., 2022a;b) has emerged as a cornerstone framework within TDL, providing a unified approach for architectures operating on diverse topological data types, including simplicial (Bodnar et al., 2021b), cellular (Bodnar et al., 2021a;Hajij et al., 2020), and combinatorial complexes (Hajij et al., 2022a;b). These advancements extend traditional Message Passing Neural Networks (MPNNs) (Gilmer et al., 2017), widely used in graph learning, to higher-order topological domains.', '> ', '> Despite their prevalence, MPNNs are known to suffer from significant expressivity limitations, often failing to distinguish even simple non-isomorphic graphs (Morris et al., 2019;Xu et al., 2018). This has spurred extensive research into developing more expressive graph architectures (Morris et al., 2023;Maron et al., 2019;Morris et al., 2019;Bevilacqua et al., 2021;Abboud et al., 2020;Bouritsas et al., 2022). Given the foundational similarities between HOMP and MPNNs, a critical unanswered question arises: What are the inherent limitations of higher-order message-passing architectures in distinguishing topological objects? This question, recently highlighted in a position paper (Papamarkou et al., 2024), forms the central focus of this work.', '> ', "> We address this question through a rigorous topological expressivity analysis. First, we establish a novel topological criterion for HOMP-indistinguishability based on covering spaces. Utilizing this criterion, we rigorously demonstrate HOMP's inability to differentiate between complexes based on fundamental topological and metric invariants, including diameter, orientability, planarity, and homology groups. These findings are particularly salient, as the core objective of TDL is precisely to leverage such topological structures. Indeed, several existing methods attempt to inject information related to these properties into pre-existing frameworks (Horn et al., 2021;Chen et al., 2021;Rieck, 2023;Zhang et al., 2023c). Furthermore, recognizing that many topological datasets are constructed by lifting graph data, we also investigate HOMP's limitations in expressively utilizing lifting and pooling methods for graph distinction.", '> ', "> In the second part of this paper, we introduce Multi-Cellular Networks (MCN), a new class of TDL architectures specifically designed to overcome HOMP's expressivity limitations. MCN draws inspiration from highly expressive higher-order graph architectures (Maron et al., 2019;Morris et al., 2019;Keriven & Peyré, 2019;Azizian & Lelarge, 2020), which have successfully addressed MPNN expressivity issues. By integrating the equivariant linear layers introduced in Maron et al. (2018) into the HOMP pipeline, MCN achieves architectures reminiscent of Invariant Graph Networks (IGNs). We formally prove that MCN can achieve full expressivity in distinguishing non-isomorphic complexes. Acknowledging the scalability challenges inherent in both IGNs and MCN, we propose a more efficient alternative: Scalable MCN (SMCN). SMCN applies expressive graph layers—often used as practical substitutes for IGNs—to augmented graph structures defined over the cells of the complex. We prove that SMCN effectively mitigates many of HOMP's expressivity limitations while maintaining computational tractability.", '> ', "> We conduct extensive empirical evaluations of SMCN on several real-world (lifted) graph benchmarks, demonstrating significant performance gains over both standard HOMP baselines and expressive GNNs. This underscores the practical value of expressively leveraging topological information. Additionally, we introduce three novel benchmarks specifically designed to assess TDL architectures' ability to capture topological and metric properties. The first, the Torus Dataset, is a BREC-like (Wang & Zhang, 2024) dataset comprising pairs of cellular complexes with distinct topological or metric invariants, which HOMP models are provably unable to distinguish. The other two benchmarks evaluate models' ability to predict topological properties of complexes derived from molecular graphs in the ZINC dataset (Sterling & Irwin, 2015).", '> ', '> Our contributions. In summary, the key contributions of this paper are:', '> (1) A comprehensive topological expressivity analysis of HOMP, evaluating its capacity to capture fundamental topological and metric invariants and its limitations in leveraging lifting and pooling methods.', '> (2) The introduction of Multi-Cellular Networks (MCN), a novel class of TDL models inspired by IGNs, which are provably fully expressive for distinguishing non-isomorphic CCs.', "> (3) The development of Scalable MCN (SMCN), a practical and efficient variant of MCN that effectively addresses many of HOMP's expressivity limitations.", '> (4) The construction of three novel benchmarks specifically designed for assessing the topological expressivity of TDL architectures.', '> (5) Extensive empirical evaluation of SMCN, demonstrating superior performance over both standard TDL methods and expressive graph models on real-world and synthetic benchmarks, thereby highlighting the practical benefits of expressively leveraging higher-order topological information.', '19,22c28,39', '< Notation. We denote [n] = {1, . . . , n}. The size of a set S is denoted by |S|. and denote aggregation functions, where is permutation invariant. Bold lowercase letters denote tuples of integers e.g. k = (k 0 , . . . , k ℓ ). e i denotes the tuple with one at the i-th position and zeros elsewhere.', '< Combinatorial complexes. Combinatorial complexes (CCs) are a class of higher-order objects that can flexibly represent many types of hierarchical data. Most topological data domains, including simplicial complexes, cellular complexes, and hypergraphs, can be considered subclasses of combinatorial complexes. Therefore, throughout the paper, all data objects are represented as CCs.', '< Definition 3.1 (Combinatorial complex). A combinatorial complex (CC) is a 3-tuple (S, X , rk) comprising a node set S, a cell set X ⊆ P(S) \\ ∅, and a rank function rk : X → Z ≥0 such that ∀s ∈ S, {s} ∈ X , rk({s}) = 0, and ∀x, y ∈ X x ⊆ y ⇒ rk(x) ≤ rk(y).', '< The set of r-rank cells (r-cells) is called the r-skeleton and is denoted by X r = rk -1 (r), its size is denoted by n r := |X r |; the dimension of a CC is ℓ = max x∈X rk(x). We often simplify the notation and use X to denote the entire CC. For definitions of simplicial and cellular complexes, we refer the reader to Bodnar et al. (2021a) and Bodnar et al. (2021b).', '---', '> Notation. We denote the set of integers from 1 to n as [n] = {1, . . . , n}. The cardinality of a set S is denoted by |S|. Aggregation functions, commonly used in message-passing, are generally permutation-invariant. Bold lowercase letters like k = (k 0 , . . . , k ℓ ) denote tuples of integers, and e i denotes a tuple with a one at the i-th position and zeros elsewhere, used to specify a particular rank.', '> ', '> Combinatorial Complexes. Combinatorial complexes (CCs) offer a highly flexible and general framework for representing hierarchical data, encompassing a broad spectrum of topological data domains such as simplicial complexes, cellular complexes, and hypergraphs as special cases. This generality makes CCs a powerful choice for modeling complex relationships beyond pairwise interactions. Consequently, in this paper, all data objects are uniformly represented as CCs.', '> ', '> Definition 3.1 (Combinatorial Complex). A combinatorial complex (CC) is formally defined as a 3-tuple (S, X , rk), where:', '> - S is a finite set of base elements, referred to as nodes.', '> - X is a collection of non-empty subsets of S, called cells, such that for every node s ∈ S, the singleton set {s} is a cell in X.', '> - rk : X → Z ≥0 is a rank function that assigns a non-negative integer rank to each cell, satisfying two conditions:', '>     1. For any node s ∈ S, rk({s}) = 0 (nodes are 0-cells).', '>     2. For any two cells x, y ∈ X, if x is a subset of y (x ⊆ y), then rk(x) ≤ rk(y) (preserving hierarchy).', '> ', '> The set of all cells with rank r is termed the r-skeleton, denoted by X r = rk -1 (r), and its size is n r := |X r |. The dimension of a CC, denoted by ℓ, is the maximum rank assigned to any cell, i.e., ℓ = max x∈X rk(x). For brevity, we often use X to refer to the entire CC. For detailed definitions of simplicial and cellular complexes, the reader is referred to Bodnar et al. (2021a) and Bodnar et al. (2021b), respectively.', '1056d1072', '< ']
