Title: TOPOLOGICAL BLINDSPOTS: UNDERSTANDING AND EXTENDING TOPOLOGICAL DEEP LEARNING THROUGH THE LENS OF EXPRESSIVITY

Abstract: Topological deep learning (TDL) is a rapidly growing field that seeks to leverage topological structure in data and facilitate learning from data supported on topological objects, ranging from molecules to 3D shapes. Most TDL architectures can be unified under the framework of higher-order message-passing (HOMP), which generalizes graph message-passing to higher-order domains. In the first part of the paper, we explore HOMP's expressive power from a topological perspective, demonstrating the framework's inability to capture fundamental topological and metric invariants such as diameter, orientability, planarity, and homology. In addition, we demonstrate HOMP's limitations in fully leveraging lifting and pooling methods on graphs. To the best of our knowledge, this is the first work to study the expressivity of TDL from a topological perspective. In the second part of the paper, we develop two new classes of architecturesmulti-cellular networks (MCN) and scalable MCN (SMCN) -which draw inspiration from expressive GNNs. MCN can reach full expressivity, but scaling it to large data objects can be computationally expansive. Designed as a more scalable alternative, SMCN still mitigates many of HOMP's expressivity limitations. Finally, we create new benchmarks for evaluating models based on their ability to learn topological properties of complexes. We then evaluate SMCN on these benchmarks and on real-world graph datasets, demonstrating improvements over both HOMP baselines and expressive graph methods, highlighting the value of expressively leveraging topological information. Code and data are available at https://github.com/yoavgelberg/SMCN. * Equal contribution. 1 Generally, dr 1 ̸ = dr 2 , e.g. atoms (0-cells) and bonds (1-cells) might have a different number of features. 2 This is equivalent to setting MLP (t) i,r ≡ 0 for neighborhood functions Ni that are not associated with an incoming edge. Equation 3 in its full generality corresponds to a fully-connected tensor diagram. 3 X and X ′ are isomorphic if there exists a bijective mapping ϕ : X → X ′ which is both rank-preserving and inclusion-preserving, see Appendix E for a formal definition.

Section: INTRODUCTION
Topological Deep Learning (TDL) is an emerging field focused on learning from data supported on topological objects. Higher-order message-passing (HOMP) (Hajij et al., 2022a;b) has emerged as a key framework in TDL, unifying architectures designed for various topological data types. Originally introduced for simplicial complexes (Bodnar et al., 2021b), HOMP has been successively adapted for cellular complexes (Bodnar et al., 2021a;Hajij et al., 2020), and more recently, for combinatorial complexes (Hajij et al., 2022a;b). Each adaptation is a direct generalization of its predecessor. The HOMP framework extends traditional message-passing neural networks (MPNNs) (Gilmer et al., 2017), widely used in graph learning, to higher-order topological domains.
Despite their widespread adoption in various graph learning applications, MPNNs are known to struggle with expressivity limitations, often failing to distinguish even simple non-isomorphic graphs (Morris et al., 2019;Xu et al., 2018). This realization has led to a substantial body of work dedicated to developing more expressive graph architectures (Morris et al., 2023;Maron et al., 2019;Morris et al., 2019;Bevilacqua et al., 2021;Abboud et al., 2020;Bouritsas et al., 2022). Given the similarity between HOMP and MPNNs, a natural question arises: What are the limitations of higher-order message-passing architectures in distinguishing topological objects? This question, highlighted in a recent position paper (Papamarkou et al., 2024), is the main focus of this paper. We address this question from a topological perspective. First, we introduce a topological criterion designed to identify cases in which a pair of complexes is indistinguishable by HOMP. We then use this criterion to prove HOMP's inability to differentiate between complexes based on several fundamental topological and metric invariants, including diameter, orientability, planarity, and homology groups. These limitations are particularly noteworthy, as TDL's main goal is to leverage topological structure in data. In fact, several methods directly inject information closely related to some of the above properties into pre-existing framewroks (Horn et al., 2021;Chen et al., 2021;Rieck, 2023;Zhang et al., 2023c). Additionally, since many topological data objects are constructed by lifting graph data, we examine HOMP's limitations in expressively leveraging lifting and pooling methods to distinguish graphs.
In the second part of the paper, we introduce a new class of TDL architectures called multi-cellular networks (MCN) designed to address HOMP's expressivity limitations. MCN draws inspiration from higher-order graph architectures (Maron et al., 2019;Morris et al., 2019;Keriven & Peyré, 2019;Azizian & Lelarge, 2020), which successfully resolve expressivity limitations in MPNNs. MCN utilizes the equivariant linear layers introduced in Maron et al. ( 2018) and integrates them into the HOMP pipeline, resulting in architectures reminiscent of Invariant Graph Networks (IGNs) introduced in the same paper. We prove that MCN can reach full expressivity in distinguishing non-isomorphic complexes. Recognizing the scalability challenges of both IGNs and MCN, we propose an alternative called scalable MCN (SMCN). SMCN models apply expressive graph layers -often used as practical alternatives to IGNs -on graph structures defined over the cells of the complex. We prove that SMCN still mitigates many of HOMP's expressivity limitations.
We empirically evaluate SMCN on several real-world (lifted) graph benchmarks and find performance gains over both standard HOMP baselines and expressive GNNs, highlighting the value of expressively leveraging topological information. Additionally, we design three benchmarks to assess TDL architectures' ability to capture topological and metric information. The first, called the Torus Dataset, is a BREC-like (Wang & Zhang, 2024) dataset consisting of pairs of cellular complexes comprising one or more disjoint tori. Models are tasked with separating each pair in a statistically significant way. The two other benchmarks evaluate models based on their ability to predict topological properties of complexes obtained by lifting molecular graphs from ZINC (Sterling & Irwin, 2015).
Our contributions. Summarizing, the key contributions of this paper are as follows: (1) We provide a comprehensive analysis of HOMP's expressive power, evaluating its ability to capture topological and metric invariant and leverage lifting and pooling methods.
(2) We introduce multi-cellular networks (MCN), a novel class of TDL models, inspired by IGNs, which can provably reach full expressivity.
(3) We develop SMCN, a scalable version of MCN that addresses HOMP's expressivity limitations while maintaining computational efficiency. (4) We construct three benchmarks for assessing the topological expressivity of TDL architectures. (5) We empirically evaluate the performance of SMCN, demonstrating improvements over both standard TDL methods and expressive graph models, highlighting the benefits of expressively leveraging topological information.

Section: PREVIOUS WORK
Topological Deep Learning. TDL architectures enable learning from data supported on topological objects, traditionally focusing on four domains: hypergraphs, simplicial complexes, cellular complexes, and combinatorial complexes. The prominent framework for the latter three is higher-order message-passing (HOMP). Originally introduced for simplicial complexes (Bodnar et al., 2021b) and later extended to cellular (Bodnar et al., 2021a;Hajij et al., 2020) and combinatorial complexes (Hajij et al., 2022a;b), HOMP architectures achieve strong experimental results and have been shown to enhance the expressive power of MPNNs. Another approach in TDL research incorporates precomputed topological information into existing models like MPNNs (Horn et al., 2021;Chen et al., 2021;Rieck, 2023) and HOMP (Verma et al., 2024;Buffelli et al., 2024). These methods enhance the expressive power of the base models and show strong experimental results, highlighting the value of integrating topological features. Most prior work is focused on the expressivity of TDL models with respect to graphs, the ability of HOMP to capture topological and metric invariants of complexes without relying on pre-computation remains unexplored.
Expressive power of GNNs. The expressivity of GNNs is typically evaluated in terms of their separation power, i.e. their ability to assign distinct values to non-isomorphic graphs. Seminal works by Morris et al. (2019) and Xu et al. (2018) demonstrate that the expressive power of MPNNs is equivalent to that of the 1-WL test (Weisfeiler & Leman, 1968). These findings inspired the development of more expressive GNNs, with expressive power surpassing that of the 1-WL test, albeit often requiring greater computational resources. Morris et al. (2019) and Maron et al. (2018) propose architectures that are as expressive as the k-WL test with O(n k ) runtime and memory complexity. Various other expressive GNNs have been introduced in the literature, utilizing techniques such as random features (Abboud et al., 2020), substructure counts (Bouritsas et al., 2022), equivariant polynomials (Maron et al., 2019;Puny et al., 2023), processing sets of subgraphs (Bevilacqua et al., 2021;Frasca et al., 2022;Zhang et al., 2023b;Zhang & Li, 2021;Cotta et al., 2021) and more. We use these frameworks, specifically the architectures proposed in Maron et al. (2019) and Zhang et al. (2023b); Bar-Shalom et al. (2024) to construct efficient and expressive models for combinatorial complexes. Bamberger (2022) offers a perspective on MPNN expressivity through the lens of graph coverings. We extend their work to combinatorial complexes and use it to analyze the topological expressivity of HOMP. For a comprehensive review of expressive graph architectures, refer to the following surveys: Jegelka (2022), Morris et al. (2023); Zhang et al. (2023a).

Section: PRELIMINARIES
Notation. We denote [n] = {1, . . . , n}. The size of a set S is denoted by |S|. and denote aggregation functions, where is permutation invariant. Bold lowercase letters denote tuples of integers e.g. k = (k 0 , . . . , k ℓ ). e i denotes the tuple with one at the i-th position and zeros elsewhere.
Combinatorial complexes. Combinatorial complexes (CCs) are a class of higher-order objects that can flexibly represent many types of hierarchical data. Most topological data domains, including simplicial complexes, cellular complexes, and hypergraphs, can be considered subclasses of combinatorial complexes. Therefore, throughout the paper, all data objects are represented as CCs.
Definition 3.1 (Combinatorial complex). A combinatorial complex (CC) is a 3-tuple (S, X , rk) comprising a node set S, a cell set X ⊆ P(S) \ ∅, and a rank function rk : X → Z ≥0 such that ∀s ∈ S, {s} ∈ X , rk({s}) = 0, and ∀x, y ∈ X x ⊆ y ⇒ rk(x) ≤ rk(y).
The set of r-rank cells (r-cells) is called the r-skeleton and is denoted by X r = rk -1 (r), its size is denoted by n r := |X r |; the dimension of a CC is ℓ = max x∈X rk(x). We often simplify the notation and use X to denote the entire CC. For definitions of simplicial and cellular complexes, we refer the reader to Bodnar et al. (2021a) and Bodnar et al. (2021b).
Neighborhood functions. Neighborhood functions are a key component in HOMP, facilitating dynamic aggregation of information across cells. Formally, a neighborhood function can be any function N : X → P(X ), but the most common neighborhood functions are (1) The (r 1 , r 2 )-adjacency and co-adjacency, defined by
A r1,r2 (x) = {y ∈ X r1 | ∃z ∈ X r2 s.t. x, y ⊆ z}, coA r1,r2 (x) = {y ∈ X r1 | ∃z ∈ X r2 s.t. z ⊆ x, y},(1)
for x ∈ X r1 , and A r1,r2 (x) = coA r1,r2 (x) = ∅ for x / ∈ X r1 .
(2) The (r 1 , r 2 )-upper and lower incidence, defined by
B r1,r2 (x) = {y ∈ X r2 | x ⊆ y}, B ⊤ r1,r2 (x) = {y ∈ X r2 | y ⊆ x},(2)
for x ∈ X r1 , and
B r1,r2 (x) = B ⊤ r1,r2 (x) = ∅ for x / ∈ X r1 .
We call the neighborhood functions defined above natural neighborhood functions, a collection we denote by N nat . See Appendix A.2 for an illustration. Given an enumeration of the cells, a neighborhood function can be represented in matrix form. For example, given a graph G = (V, E) viewed as a one dimensional CC through S = V, X 0 = {{v} | v ∈ V} and X 1 = E, the matrix forms of the neighborhood functions A 0,1 and B 0,1 are the graph adjacency and incidence matrices respectively.
Cochain spaces. Data defined over an ℓ-dimensional CC can be viewed as a collection of functions {h r : X r → R dr } ℓ r=0 1 . Each of these functions is called a cochain or a cell feature map. The vector space of all cochains over cells of rank r is denoted by C r (X , R dr ) or C r . The feature associated with a cell x ∈ X r is denoted by h r (x), (h r ) x , or simply h x . Higher-order message-passing. Higher-order message passing (HOMP) (Hajij et al., 2022b) is a general computational framework for processing information supported on higher-order domains by exchanging messages across cells. Let N = {N 1 , . . . , N k } be a collection of neighborhood functions; given an initial cochain h (0) = h, HOMP is recursively defined via the following update rule
C 0 C 2 A 0,1 B 0,2 B ⊤ 2,0(
h (t+1) x = β   k i=1 y∈Ni(x) MLP (t) i,rk(x) h (t) x , h (t) y   ,(3)
where h
(t)
x is the feature associated with cell x ∈ X at layer t, and β is a nonlinear activation. In the rest of the paper, we assume N ⊆ N nat . Similarly to MPNNs, the HOMP framework encompasses many TDL architectures, including architectures for simplicial complexes Bodnar et al. (2021b), cellular complexes Hajij et al. (2020); Bodnar et al. (2021a); Giusti et al. (2023), andcombinatorial complexes Hajij et al. (2022a;b).
Tensor diagrams. Hajij et al. (2022a) introduce tensor diagrams, a DAG notation scheme for navigating the rich space of possible HOMP architectures. Tensor diagrams allow for selective aggregation over different neighborhood functions for different cochain spaces in different layers of the network. The nodes of a tensor diagram represent cochain spaces, and the edges represent neighborhood functions. The signal flows from each level of the diagram to the next via the update rule specified in Equation 3, where aggregation is performed only over neighborhood functions associated with the incoming edges 2 . See Figure 2 for an illustration of a tensor diagram and Hajij et al. (2022b) for an in-depth overview.

Section: EXPRESSIVITY LIMITATIONS OF HIGHER-ORDER MESSAGE-PASSING
The expressivity of graph models is often evaluated in terms of their ability to assign different values to non-isomorphic graphs. Similarly, we study HOMP's ability to distinguish non-isomorphic CCs 3 .

Section: A TOPOLOGICAL CRITERION FOR HOMP-INDISTINGUISHABILITY
The main tool we use throughout this section is a topological HOMP-indistinguishability criterion based of the notion of covering spaces, extending the main result of Bamberger (2022) from graph coverings to combinatorial complex coverings. Definition 4.1 (CC covering). X is said to cover X if there exists a surjective rank-preserving map ρ : X → X which is a local isomorphism with respect to natural neighborhood functions (i.e. ρ bijectively maps the set N (x ′ ) to N (ρ(x ′ )) for all x ′ ∈ X and N ∈ N nat ).
Examples of CC coverings are depicted in Figures 3,10,and 11. Explicit constructions of covering CCs can be found in Appendix B. The following theorem shows that complexes sharing a cover are indistinguishable by HOMP models. See Appendix B for a proof of Theorem 4.2 and further discussion.
Theorem 4.2 (HOMP-indistinguishability criterion). Let X and X ′ be CCs such that |X 0 | = |X ′ 0 |. If there exists a CC X that covers each of the connected components of both X and X ′ , then for every HOMP model M, M(X ) = M(X ′ ). Although CCs are combinatorial objects, they give rise to various metric and topological spaces. The shortest path distance with respect to any neighborhood function defines a metric on the cells of the CC. In addition, if the complex is cellular or simplicial, it can be canonically associated with a topological space. The topological/metric properties of these spaces are invariants of the underlying CC. We prove HOMP's inability to distinguish between complexes based on the following common invariants: (1) the diameter, which measures how "spread out" the complex is; (2) orientability, which captures whether a consistent "side" or direction can be defined across the entire space; (3) planarity, which captures whether the complex can be embedded in R 2 ; (4) the homology groups, which encode the structure of "d-dimensional holes"4 . Theorem 4.3 (Topological blindspots). For any invariant I ∈ {diameter, orientability, planarity, homology} there exists a pair of HOMP-indistinguishable CCs that differ in I. Figure 1 depicts pairs of HOMP-indistinguishable CCs which differ in each of the above invariants. In addition, Appendix B provides a detailed discussion regarding each invariant, as well as a complete proof of Theorem 4.3. To demonstrate the techniques used in the proof, we include a short proof sketch for the case of orientability and planarity.

Section: TOPOLOGICAL AND METRIC LIMITATIONS
Proof sketch. Let Cyl h,p and Möb h,p be a Cylinder and a Möbius strip with height h and cycle length p. The cylinder is orientable and planar, while the Möbius strip is neither (see Figure 4 for an illustration). As illustrated in Figure 3, Cyl h,2p covers both Cyl h,p and Möb h,p . Intuitively, Cyl h,2p covers Cyl h,p by "wrapping" around it twice, and Möb h,p by "wrapping" and "twisting" around it (formal construction of both covering maps appears in Appendix B). Since both CCs are connected and have the same number of 0-cells, Theorem 4.2 shows they are HOMP-indistinguishable.

Section: LIFTING AND POOLING
One benefit of combinatorial complexes is their flexibility in incorporating lifting and pooling5 methods to construct CCs from graphs. Common graph lifting methods, such as the ones in Bodnar et al. (2021a;b), add meaningful substructures (that standard message-passing cannot detect) as higherorder cells. This results in models that are strictly more expressive than MPNNs. Graph pooling methods, like spectral pooling Ma et al. (2019), Mapper Singh et al. (2007); Hajij et al. (2018);Dey et al. (2016), andDiffPool Ying et al. (2018), coarsen input graphs to enable more efficient learning.
A common feature of many lifting and pooling methods is their ability to generate CCs with a small number of high-order cells that differ in fundamental topological and metric invariants. The sparsity of these cells allows for efficient computation of these invariants, resulting in an efficient way to distinguish the original graphs. However, the following proposition, formally stated and proved in Appendix B.3, suggests that HOMP may still struggle to differentiate between the resulting CCs.
Proposition 4.4. There exist pairs of CCs -generated by standard lifting and pooling methods on graphs (See Figures 10,11 in Appendix B.3) -that HOMP fails to distinguish, even though they differ in basic topological/metric properties. These properties can be efficiently computed due to the sparsity of higher-order cells. "SCL" label "equiv" label 
C e0 C e2 C e0+e1 C 2e0 C 2e0+e1 C 3e0

Section: MULTI-CELLULAR NETWORKS
In graph learning, expressivity limitations similar to those shown in Section 4 have been mitigated by architectures that process features defined over tuples of nodes, and in particular by IGNs (Maron et al., 2018;2019;Keriven & Peyré, 2019;Azizian & Lelarge, 2020). These models are built by stacking equivariant linear layers between high-order tensor spaces defined over the nodes of the graph, interleaved with pointwise non-linearities. We use a similar approach, introducing multicellular cochain spaces (as an analogue to IGN tensor spaces) and incorporating their induced equivariant linear updates into the HOMP framework. For an overview of IGNs, see Appendix A.1.
Multi-cellular cochain spaces. Given an ℓ-dimensional CC X and an (ℓ + 1)-tuple k ∈ N ℓ+1 , a k-order multi-cellular cochain is a function
h k : X k0 0 × • • • × X k ℓ ℓ → R d .
The vector space of multi-cellular cochains, denoted by C k (X , R d ) or C k , is called a multi-cellular cochain space. Multi-cellular cochain spaces are a natural generalization of standard cochain spaces, providing a way to represent other types of CC data. E.g. (1) C ei ∼ = C i ; (2) B r1,r2 can be represented as a multi-cellular cochain ∈ C er 1 +er 2 ; (3) (co)A r1,r2 can be represented as a multi-cellular cochain ∈ C 2er 1 (see appendix C for details). Moreover, multi-cellular cochain spaces recover many linear spaces studied in several previous works. For example, C ker matches the features space of a k-IGN layer operating on r-cells, and C er 1 +er 2 corresponds to the input space of the exchangeable matrix layers introduced in Hartford et al. (2018).
Symmetry group. Given enumerations of the sets X 0 = {x 0 1 , . . . , x 0 n0 }, . . . , X ℓ = {x ℓ 1 , . . . , x ℓ n ℓ }, a multi-cellular cochain h ∈ C k (X , R d ) can be identified with a tensor A h defined by
(A h ) i0,...,i ℓ ,: = h x 0 (i0)1 , . . . , x 0 (i0) k 0 , . . . , x ℓ (i ℓ )1 , . . . , x ℓ (i ℓ ) k ℓ (4)
for multi-indices i 1 ∈ {1, . . . , n 0 } k0 , . . . , i ℓ ∈ {1, . . . , n ℓ } k ℓ . Therefore, C k can be identified with the tensor space
R n k 0 0 ×•••×n k ℓ ℓ ×d . The group G = S n0 × • • • × S n ℓ acts on h ∈ C k by (σ • h)(x 0 , . . . , x ℓ ) = ((σ 0 , . . . , σ ℓ ) • h)(x 0 , . . . , x ℓ ) = h(σ 0 • x 0 , . . . , σ ℓ • x ℓ ),(5)
where if x r = (x r j1 , . . . , x r j kr ) ∈ X kr r is a tuple of cells, σ r • x r = (x r σ -1 r (j1) , . . . , x r σ -1 r (j kr ) ). In simple terms, the group G acts on h by reordering the cells of each rank independently. Therefore, to ensure independence of cell ordering, we aim to construct G-invariant architectures.
Equivariant updates. Since the space C k can be identified with
R n k 0 0 ×•••×n k ℓ ℓ ×d , we utilize the basis of equivariant linear layers R n k 0 0 ×•••×n k ℓ ℓ ×d → R n k ′ 0 0 ×•••×n k ′ ℓ ℓ ×d ′
, constructed in Maron et al. (2018), to describe the space of equivariant linear maps
C k → C k ′ . Using this basis, denoted {L γ } γ∈Γ(k,k ′ ,d,d ′ ) (Here Γ(k, k ′ , d, d ′ )
is an index set defined in Appendix C), we follow the construction of well-known permutation-invariant architectures (Maron et al., 2018;Zaheer et al., 2017;Hartford et al., 2018;Bronstein et al., 2021), and define learnable equivariant layers
F (h) = β γ∈Γ(k,k ′ ,d,d ′ ) w γ L γ (A h ) ,(6)
where {w γ } γ∈Γ(k,k ′ ,d,d ′ ) are learnable parameters and β is a non-linearity.
MCN. We incorporate equivariant layers to the HOMP framework by adding new node and edge labels to the tensor diagram scheme (as depicted in Figure 5), defining a new class of TDL architectures we call multi-cellular networks (MCNs). At each layer of an MCN tensor diagram, if v is a node labeled by C k , we compute a multi-cellular cochain h (v) ∈ C k by h
(v) x = u∈pred(v) m u,v (x)
, where pred(v) denotes the set of predecessor nodes in the diagram, and messages m u,v ∈ C k are computed based on the label of the edge (u, v). If the edge is labeled with "equiv", the message is computed as described in Equation 6. For edges labeled by neighborhood functions, the message follows the standard tensor diagram update rule. A formal definition of the MCN framework can be found in Appendix C. Using nodes labeled by higher-order multi-cellular cochain spaces and equivariant updates improves expressivity. In fact, by using multi-cellular cochain spaces of high order, MCN can reach full expressivity, as proved in Appendix E.
Proposition 5.1 (MCN is fully expressive). If X and X ′ are non-isomorphic CCs, there exists an MCN model M such that M(X ) ̸ = M(X ′ ).

Section: SCALABLE MULTI-CELLULAR NETWORKS
X H coA2,1 (X ) coA 2,1 Figure 6: H coA2,1 .
Despite its strong expressive power, implementing MCN in full generality is impractical as the computational complexity and the size of the basis |Γ(k, k ′ , d, d ′ )| grow exponentially with k and k ′ . In this section, we design scalable MCN (SMCN), a more efficient version of MCN that still mitigates many of HOMP's expressivity limitations. Below is an overview and motivation for the SMCN architecture. A formal construction and computational runtime analysis can be found in Appendix D; an implementation guide and empirical runtime evaluation are provided in Appendix G.
First, we restrict SMCN to multi-cellular cochain spaces C k with ℓ i=0 k i ≤ 3. Of these, the spaces whose updates incur the heaviest computational overhead are C 3er and C 2er 1 +er 2 . We replace the updates induced by these spaces with new updates inspired by expressive GNNs, providing a middle ground between expressive power and scalability. These GNNs are applied to graph structures, called augmented Hasse graphs, that capture relational information between cells. Definition 6.1 (Augmented Hasse graph). The augmented Hasse graph of X w.r.t. (co)A r1,r26 , is defined by H (co)Ar 1 ,r 2 = (V, E), where V = X r1 and E = {(x, y) | y ∈ (co)A r1,r2 (x)}.
See Figure 6 for an illustration of an augmented Hasse graph.
Replacing C 2er 1 +er 2 with C er 1 +er 2 . Recall that C 2er 1 +er 2 can be identified with R n 2 r 1 ×nr 2 ×d .
Under the action of S nr 1 × S nr 2 , a tensor H ∈ R n 2 r 1 ×nr 2 ×d can be viewed as a bag of tensors {H k ∈ R nr 1 ×nr 1 ×d } k∈[nr 2 ] , each of which is considered up-to S nr 1 permutations. These are the exact objects processed by Subgraph GNNs (Bevilacqua et al., 2021;Frasca et al., 2022;Zhang et al., 2023b), which operate on a set of adjacency matrices corresponding to different subgraphs defined Published as a conference paper at ICLR 2025 Subgraph GNN , , , ,
{X x } x∈X2 {H x A0,1 } x∈X2 ∈ C e0+e2 (X )
Figure 7: Illustration of an SCL update on C e0+e2 . We construct a bag of copies of X with marked 2-cells. This bag is processed by applying a subgraph GNN to the bag of the corresponding marked A 0,1 augmented Hasse graphs. Here, X x represents the complex X with a distinct "marking" feature added to the cell features of cell x. Similarly, H x A0,1 denotes the Augmented Hasse graph H A0,1 with a marking added to the nodes corresponding to the elements of cell x . over a fixed set of nodes. Subgraph GNNs are strictly more expressive than MPNNs, demonstrate strong experimental performance, and have quadratic runtime complexity as opposed to O(n 2 r1 •n r2 ) for C 2er 1 +er 2 → C 2er 1 +er 2 equivariant updates.
Following the above discussion, we define the subcomplex layer (SCL) updates. Let (u, v) be tensor diagram edge labeled by "SCL", connecting layers t and t+1 (in which case v and u are both labeled by C er 1 +er 2 ). The message m u,v ∈ C er 1 +er 2 is computed by
m u,v (x, y) = ℓ r=0,r ′ =0 MLP r,r ′ h (t) x,y , h(t)
(co)Ar 1 ,r (x),y , h
x,(co
)A r 2 ,r ′ (y) , h x,Br 1 ,r 2 (x) , h B ⊤ r 2 ,r 1 (y),y ,
where h Q1,y := x ′ ∈Q1 h x ′ ,y and h x,Q2 := y ′ ∈Q2 h x,y ′ . This update can be viewed as applying a subgraph update to bags of augmented Hasse graphs, as shown in Figure 7. Additional details are provided in Appendix D. For a review of subgraph networks, see Appendix A.1.
Replacing C 3er with C 2er . Since C 3er can be identified with R n 3 r ×d , equivariant linear layers of the form L : C 3er → C 3er can be identified with 3-IGN layers acting on augmented Hasse graphs of the form H (co)A r,r ′ . The GNN literature offers several candidates for efficient 3-IGN substitutes. The first option we considered is PPGN (Maron et al., 2019), which matches 3-IGN's 3-WL expressive power with a runtime of O(|V| 2.s ). Another option is using subgraph networks with node marking, which results in the SCL update rule above with r 1 = r 2 = r. These networks have a runtime of O(|V| • |E|) and are strictly more expressive than MPNNs (> 2-WL), but are less powerful than 3-IGNs (Frasca et al., 2022). We experimented with both versions and found no significant performance improvement using PPGN. Therefore, we continue with the subgraph version, but note that -since PPGN can implement subgraph networks -all theoretical results hold for the PPGN case as well.

Section: EXPRESSIVE POWER OF SCALABLE MULTI-CELLULAR NETWORKS
Topological and metric properties. In Section 4.2, we proved that HOMP models cannot distinguish CCs based on diameter, orientability, planarity or homology. We now examine SMCN with respect to each of these limitations. First, SMCN fully mitigates HOMP's inability to compute diameters. Proposition 6.2. Any pair of CCs with different diameters can be distinguished by an SMCN model.
Next, SMCN can distinguish between a cylinder and a Möbius strip, implying that it is strictly better than HOMP at detecting planarity and orientability. Proposition 6.3. There exists an SMCN model that separates the Möbius strip and the cylinder.
Finally, we offer two results demonstrating SMCN's ability to distinguish CCs based on their homology groups. The first result examines the 0-th homology group. Proposition 6.4. Any pair of CCs with distinct 0-th homology groups can be distinguished by SMCN.
The second result generalizes to homology groups of any order in the case of two-dimensional surfaces embedded in R 3 . Proposition 6.5. Let X , X ′ be a pair of CCs whose underlying topology corresponds to a 2dimensional surface (with or without boundary) embeddable in R 3 . If X and X ′ differ in any homology group, there exists an SMCN model that distinguishes them.
A full exploration of SMCN's ability to capture homology groups of any order, orientability and planarity is left for future work. Collectively, Propositions 6.2 -6.5 suggest that SMCN is strictly better than HOMP at leveraging topological properties of CCs. Rigorous formulations and proofs of these propositions appear in Appendix F. The following is a proof sketch for Proposition 6.3. Proof sketch of Proposition 6.3. The key to distinguishing Cyl h,p and Möb h,p is their boundary 1-cells. In our case, these are the 1-cells that are incident to exactly one 2-cell, i.e. their B 1,2 -degree is 1, so they can easily be detected by an SMCN model. As illustrated in Figure 8, the boundary of Cyl h,p forms two cycles of length p while the boundary of Möb h,p forms a single cycle of length 2p. This is a standard example of a pair of graphs that are indistinguishable by MPNNs but are distinguishable by expressive graph models such as subgraph networks. We can use an SCL update to simulate a subgraph network on the boundaries, separating the two CCs.
Lifting and Pooling. In Section 4.3, we discuss HOMP's inability to expressively leverage the sparse higher-order cells generated by common lifting and pooling methods. The next proposition, proved in Appendix F.2, suggests that SMCN is able to leverage this information to a greater extent. Proposition 6.6. There exist CCs, generated from graphs by standard lifting and pooling methods, that HOMP cannot distinguish but SMCN can. The SMCN model can be constructed to have runtime
O(m deg • n 0 • n 2 )
, where m deg is the maximal degree w.r.t. any natural neighborhood function.

Section: EXPERIMENTS
The lack of CC benchmarks has been recognized as a challenge in TDL (Papamarkou et al., 2024). To address this, we introduce three novel CC benchmarks designed to assess the ability of TDL models to capture topological/metric properties, and evaluate both SMCN and other HOMP architectures on them. In addition, we adopt the setup of Bodnar et al. (2021a), applying cyclic lifting on real-world graph benchmarks. For an in-depth discussion of experimental details, see Appendix G. A comparison of the expressive power of SMCN and all baselines is available in Appendix F.3.
Torus dataset. The torus dataset consists of pairs of CCs, each comprising one or more disjoint tori (see Definition B.4). These pairs are chosen to be HOMP-indistinguishable, despite differing in basic metric/topological properties: they either have distinct homology groups, or they differ in the diameters of some of the components. Models are evaluated by counting the number of pairs they can to separate in a statistically significant way, following the protocols outlined in (Wang & Zhang, 2024). In our experiments, HOMP was unable to distinguish any of the pairs, while SMCN was able to distinguish all pairs. Predicting topological and metric properties. We construct two additional benchmarks in which models are tasked with predicting topological and metric properties of CCs lifted from ZINC (Sterling & Irwin, 2015) molecular graphs. The predicted properties are the (0, 1, 2)-cross-diameter (Equation 24), and the second-order Betti number (rank of the second homology group). In Appendix B we show that HOMP is incapable of fully capturing either property. Table 2 presents both the MSE7 and the accuracy of predicting the target values (18 possible values for cross-diameter and 6 for Betti numbers) across three TDL models: SMCN, CIN, and a custom HOMP architecture tailored for this prediction task. The benchmarks detailed above empirically verify SMCN's superior ability to capture topological/metric properties of CCs. This is demonstrated for both synthetically generated data as well lifted molecular graphs, complementing theoretical results from Sections 4 and 6.1, demonstrating that the expressivity gains of SMCN lead to improved learning of topological/metric invariants.
Real-world graph benchmarks. We evaluate SMCN on ZINC-12K (Sterling & Irwin, 2015), MOL-HIV, and MOLESOL (Hu et al., 2020). We compare SMCN to several HOMP baselines as well as a range of expressive graph architectures. As seen in Table 1, SMCN outperforms both HOMP architectures and expressive graph methods across all three benchmarks, underscoring the value of expressively leveraging higher-order topological information on graphs. In the first part of the paper, we analyzed the expressivity limitations of HOMP from a topological perspective, proving its inability to capture the diameter, orientability, planarity, and homology of input CCs. Additionally, we showed that there exist CCs generated through common graph lifting methods which are HOMP-indistinguishable despite differing in easy-to-compute topological invariants. In the second part of the paper, we introduced MCN, inspired by k-IGNs, and its more scalable version, SMCN. We proved that, analogously to IGNs, MCN can reach full expressivity for CCs. We additionally showed that SMCN tractably addresses many of HOMP's expressivity limitations. Finally, we presented three novel benchmarks designed to evaluate TDL architectures' ability to capture topological/metric information. We evaluated SMCN on both benchmarks as well as real-world graph benchmarks. SMCN outperformed HOMP architectures on expressivity benchmarks, empirically supporting our theoretical findings. On the real-world graph benchmarks, SMCN outperformed both HOMP architectures and expressive graph architectures, demonstrating the value of expressively leveraging higher-order topological information.

Section: CONCLUSION
Limitations and future work. The components that make SMCN more expressive have runtime that scales super-linearly in the number of cells, making SMCN intractable for larger CCs. Future research may aim to design more scalable alternatives to SMCN. Additionally, although we have shown that SMCN is strictly better than HOMP in distinguishing CCs based on orientability and homology, it remains unclear if it is able to fully capture these properties. Future work can explore SMCN limitations in expressing topological and metric invariants. Finally, future research can aim to develop more complex benchmarks that include a broader range of topological properties.
The Appendix is organized as follows:
• In Appendix A.1 we give an overview of expressive GNN architectures relevant to this paper.
• In Appendix A.2 we give an illustrative example of CC neighbourhood funcitons.
• In Appendix B we prove results from Section 4 regarding the expressivity of HOMP architectures. In B.1 we prove Theorem 4.2 (topological HOMP-indistinguishability criterion), in B.2 we prove Theorem 4.3 (topological blindspots) and in B.3 we address lifting and pooling.
• In Appendix C we give an in-depth description of the MCN framework, introduced in Section 5
• In Appendix D we give an in-depth description of the SMCN framework, introduced in Section 6.
• In Appendix E we give a formal definition of CC isomorphism and prove Proposition 5.1 (MCN can distinguish any pair of non-isomorphic CCs).
• In Appendix F we analyze the expressive power of SMCN, proving all results from Section 6.1.
• In Appendix G we give further details regarding the results presented in Section 7 as well as the experimental setup.
A BACKGROUND More specifically, given a graph G with adjacency matrix A ∈ R n×n and node feature matrix X ∈ R n×d , an IGN first encodes this graph as a tensor T ∈ R n 2 ×(d+1) where T :,:,1 holds the adjacency matrix A and the last d channels hold the node features on their diagonal, i.e. T i,i,j = X i,j and T i1,i2,j = 0 for i 1 ̸ = i 2 . The symmetry group S n acts naturally on R n 2 ×(d+1) by:
σ • T i,j,k = T σ -1 (i),σ -1 (j),k σ ∈ S n .(7)
Notice that for any graph tensor T and permutation σ ∈ S n , the tensors T and σ • T represent the same graph. This action can be easily generalied to the tensor space R n k ×c by:
σT i1,...,i k mj = T σ -1 (i1),...,σ -1 (i k ),j .(8)
For any integers k, k ′ , c, c ′ Maron et al. ( 2018) finds a basis to the space of all linear maps L :
R n k ×c → R n k ′ ×c ′ which satisfy L(σ • T) = σ • L(T).
(9) These are called equivariant linear maps. a k-IGN stacks lyaers is of the form
U (T ) = β( γ∈Γ w γ L γ (T))(10)
where {L γ } γ∈Γ is a basis of the space of equivariant layers from A.2 NEIGHBORHOOD FUNCTION ILLUSTRATION The following is an example of the standard neighborhood functions introduced in Section 3. For the CC in Figure 9 the following relations hold:
R n k 1 ×c1 to R n k 2 ×c2 for some k 1 1, k 2 ≤ k and c 1 , c 2 ∈ N,
• {A} ∈ A 0,1 ({B}).
• {A} / ∈ A 0,1 ({D}).
• {A} ∈ A 0,2 ({D}).
• {C, D} ∈ coA 1,0 ({A, C}).
• {C, D} / ∈ coA 1,0 ({A, B}).
• {C, D} ∈ A 1,2 ({A, B}).
• {C, D, E} ∈ coA 2,0 ({E, F, H}).
• {C, D, E} / ∈ coA 2,1 ({E, F, H}).
• {A, B, C, D} ∈ coA 2,1 ({C, D, E}).
• {B, D} ∈ B 0,1 ({D}).
• {F, G, H} ∈ B 0,2 ({G}).
• {B} ∈ B ⊤ 1,0 ({B, D}).
• {B} / ∈ B ⊤ 1,0 ({C, D}).
• {B} ∈ B ⊤ 2,0 ({A, B, C, D}).
• {B} / ∈ B ⊤ 2,0 ({C, D, E}). HOMP can be viewed as performing parallel message passing on the connectivity structures defined by these neighborhood functions. 

Section: B EXPRESSIVITY LIMITATIONS OF HIGHER-ORDER MESSAGE-PASSING
| = |X ′ 0 |. If X and X ′ admit decompositions into connected components X = Z∈C(X ) Z, X ′ = Z ′ ∈C(X ′ ) Z ′ ,(11)
such that ∃ X that is covers each of the connected components Z ∈ C(X ), Z ′ ∈ C(X ′ ), then for every HOMP model M, M(X ) = M(X ′ ).
A combinatorial complex X is said to be connected if its Hasse graph, defined by G = (V, E) with V = X and E = {(x, y) | x ⊆ y, rk(x) = rk(y) -1}, is connected. To prove Theorem B.1, we first state and prove two lemmas.
Lemma B.2. Let ρ : X → X be a covering map. In addition, let M be a HOMP model with T layers, and let h
(t)
x and h(t)
x ′ denote the cell feature maps of X and X at layer t evaluated on cells x ∈ X and x ′ ∈ X respectively. Under these conditions, h(t)
x ′ = h (t) ρ(x ′ ) , for t = 0, . . . , T , x ′ ∈ X .
Proof. We use induction on t. For t = 0, as both CCs have no initial cellular feature maps, HOMP initializes h
(0)
x , h(0)
x ′ by assigning a constant feature to all cells and the claim holds trivially. Assume the claim holds for some t ∈ {0, . . . , T }. The HOMP update rule reads:
h (t+1) x = β   N ∈N nat y∈N (x) MLP (t) N ,rk(x) (h (t) x , h (t) y )   , h(t+1) x ′ = β   N ∈N nat y ′ ∈N (x ′ ) MLP (t) N ,rk(x ′ ) ( h(t) x ′ , h(t) y ′ )   . (12
)
Since ρ is a covering map, N (x ′ ) is bjectively mapped to N (ρ(x ′ )) for every x ′ ∈ X and every neighborhood function N ∈ N nat . Additionally, rk(ρ(x ′ )) = rk(x ′ ). This, along with the fact that is permutation invariant, and the induction hypothesis implies that:
y ′ ∈N (x ′ ) MLP (t) N ,rk(x ′ ) ( h(t) x ′ , h(t) y ′ ) = y∈N (ρ(x ′ )) MLP (t) N ,rk(ρ(x ′ )) (h (t) ρ(x ′ ) , h (t) y ).(13)
Thus, combining Equation 12and Equation 13, we get h(t+1)
x ′ = h (t+1) ρ(x ′ ) . Lemma B.3. If X is connected and ρ : X → X is a covering map, ∀x ∈ X , |ρ -1 (x)| = | X0| |X0| .
Proof. Since ρ is surjective and rank-preserving, the above is equivalent to ∀x, y ∈ X , |ρ -1 (y)| = |ρ -1 (x)|. Since X is connected, it suffices to show that this equality holds for any x, y ∈ X such that y ∈ N (x) for some function N ∈ N nat . We first show that for any natural neighborhood function N ∈ N nat and cell x ∈ X the sets {N (x ′ ) | x ′ ∈ ρ -1 (x)} are pairwise disjoint. To see this, assume by contradiction that for a pair of cells
x ′ 1 , x ′ 2 ∈ ρ -1 (x) we have N (x ′ 1 ) ∩ N (x ′ 2 ) ̸ = ∅. If z ′ ∈ N (x ′ 1 )∩N (x ′ 2 ), then there is a neighborhood function N * ∈ N nat such that x ′ 1 , x ′ 2 ∈ N * (z ′ ). Given that ρ(x ′ 1 ) = ρ(x ′
2 ), this would imply that ρ is not injective on N * (z ′ ), contradicting the definition of a covering map. Now, since y ∈ N (x), for any
x ′ ∈ ρ -1 (x) there exists a y ′ ∈ N (x ′ ) such that ρ(y ′ ) = y. Since the set {N (x ′ ) | x ′ ∈ ρ -1 (x)} is pairwise disjoint this implies that |ρ -1 (y)| ≥ |ρ -1 (x)|. Since y ∈ N (x)
, there exists a neighborhood function N * ∈ N nat such that x ∈ N * (y), implying by the same reasoning above that |ρ -1 (y)| ≤ |ρ -1 (x)|. We thus have |ρ -1 (y)| = |ρ -1 (x)| which concludes the proof.
We are now ready to prove Theorem B.1.
Proof. Let X be a combinatorial complex that covers all connected components Z ∈ C(X ) and Z ′ ∈ C(X ′ ) via maps the maps {ρ Z } Z∈C(X ) and {ρ Z ′ } Z ′ ∈C(X ′ ) respectively. Let M be a HOMP model with T layers and let h (t) , h ′(t) , and h(t) denote the cell feature maps of X , X ′ , and X respectively at layer t. Lemma B.2 implies that for every
Z ∈ C(X ), Z ′ ∈ C(X ′ ) and every z ∈ Z, z ′ ∈ Z ′ we have h (T ) z = h(T ) y ∀y ∈ ρ -1 Z (z), h ′(T ) z ′ = h(T ) y ∀y ∈ ρ -1 Z ′ (z ′ ).(14)
This implies that the sets of unique values corresponding to the multisets { {h
(T ) x | x ∈ X } }, { {h ′(T ) x ′ | x ′ ∈ X ′ } } and { { h(T ) y
| y ∈ X } } are the same. Let n y , n ′ y , ñy be the number of times the value h(T ) y appear in the multisets { {h
(T ) x | x ∈ X } }, { {h ′(T ) x ′ | x ′ ∈ X ′ } } and { { h(T ) y | y ∈ X } } respectively. Since each Z, Z ′ are connected, we can use Lemma B.3 to get that ∀z ∈ Z, ∀z ′ ∈ Z ′ , |ρ -1 Z (z)| = | X0| |Z0| and |ρ -1 Z ′ (z ′ )| = | X0| |Z ′ 0 | . This implies that ∀y ∈ X n y = ñy •   Z∈C(X ) |Z 0 | | X0 |   , n ′ y = ñy •   Z ′ ∈C(X ′ ) |Z ′ 0 | | X0 |   . (15) Since Z∈C(X ) |Z 0 | = |X 0 | = |X ′ 0 | = Z ′ ∈C(X ′ ) |Z ′ 0 |, this implies that ∀y ∈ X , n y = n ′ y .
We have shown the set of unique values corresponding to multisets { {h
(T ) x | x ∈ X } } and { {h ′(T ) x | x ′ ∈ X ′ } }
is the same, and that the number of times each value appears in the multisets is the same, thus the two multisets are equal. Since the readout of a HOMP model can is a function this multiset, X and X ′ are indistinguishable by HOMP.

Section: B.2 TOPOLOGICAL AND METRIC LIMITATIONS
In this section, we rigorously state and prove all results regarding HOMP's inability to express topological/metric properties, presented in Section B. We begin by defining the ℓ-dimensional torus CCs. As we will later see, this class provides us with examples of indistinguishable CCs that differ in both the diameter and all homology groups. ℓ-dimensional torus CCs. An ℓ dimensional torus is a Cartesian product of ℓ cycles. More formally: Definition B.4 (ℓ-dimensional torus CCs). For a sequence of integers p 1 , . . . , p ℓ , the torus T p1,...,p ℓ is a combinatorial complex (S, X , rk) defined by:
S = [p 1 ] × • • • × [p ℓ ],(16)
X r = {s k | s ∈ S, k ∈ {0, 1} ℓ , k 1 + • • • + k ℓ = r},(17)
where s k is defined by:
s k = {s + k ′ | k ′ ∈ {0, 1} ℓ , k ′ ≤ k}.
(18) The sum s + k ′ is coordinate-wise, where at coordinate j result is taken modulo p j , and
k ′ ≤ k if k ′ j ≤ k j , ∀j ∈ {1, . . . , ℓ}.
By slight abuse of notation, we sometimes refer to the set of cells of the torus by T p1,...,p ℓ as well.
We note that the torus T p1,...,p ℓ as defined above is only one possible realization of the ℓ-dimensional torus as a combinatorial complex. An example of a two-dimensional torus can be seen in Figure 1(a).
As the next lemma shows, all ℓ dimensional tori are locally isometric.
Lemma B.5. Let T p1,...,p ℓ and T p ′ 1 ,...,p ′ ℓ be two ℓ-dimensional tori such that ∀j ∈ {1, . . . , ℓ}, p j , p ′ j ≥ 3. The torus
T p1•p ′ 1 ,...,p ℓ •p ′ ℓ covers both T p1,...,p ℓ and T p ′ 1 ,...,p ′ ℓ . Proof. Denote p = (p 1 , . . . , p ℓ ), p ′ = (p ′ 1 , . . . , p ′ ℓ ), p = (p 1 , . . . , pℓ ) = (p 1 • p ′ 1 , . . . , p ℓ • p ′ ℓ ).
Additionally, denote by S, S ′ , S, and X , X ′ , X the nodes and cell sets of T p , T p ′ and T p respectively. Define ρ : S → S, ρ ′ : S → S ′ by:
ρ(s) = s mod p, ρ ′ (s) = s mod p ′ ,(19)
where s mod p := (s 1 mod p 1 , . . . , sℓ mod p ℓ ). We extend ρ and ρ ′ to X by ρ(x) = {ρ(s) | s ∈ x}. We now prove that ρ is a covering map. We start by showing that ∀r ∈ {0, . . . ℓ}, ρ( Xr ) = X r (i.e. ρ is rank-preserving). Recall that all elements of Xr are of the form sk for some
s ∈ S and k ∈ {0, 1} ℓ such that k 1 + • • • + k ℓ = r. Since p < p, for every k ′ ≤ k: (s + k ′ mod p) mod p = (s mod p) + (k ′ mod p) = ρ(s) + k ′ mod p.(20)
Therefore, ρ(s k ) = ρ(s) k ∈ X r (21) and ρ is rank-preserving. To show that ρ is a covering map, all that remains is to show that it preserves natural neighborhood functions and that it is surjective. For the former, notice that since ρ is defined on the node set S, for every x, y, z ∈ X we have:
• x ⊆ y ⇒ ρ(x) ⊆ ρ(y). • x, y ⊆ z ⇒ ρ(x), ρ(y) ⊆ ρ(z). • z ⊆ x, y ⇒ ρ(z) ⊆ ρ(x), ρ(y).
Thus, ρ preserves all natural neighborhood functions. Finally, since p 1 , . . . , p ℓ ≥ 3 it is easy to check that for any x, y ∈ X and N ∈ N nat :
y ∈ N (x) ⇒ ρ(x) ̸ = ρ(y). (22
)
This implies that ρ is a covering map. An equivalent argument shows that ρ ′ is also a covering map, completing the proof.
Lemma B.5 gives rise to the following useful corollary.
Corollary B.6. If T p1,...,p ℓ and  Note, that tori with the same number of nodes can still differ on a number of topological and metric properties. In the following we use the family of ℓ dimensional tori to produce examples of topologically/metrically distinct CCs that are indistinguishable by HOMP.
T p ′ 1 ,...,p ′ ℓ are ℓ-dimensional tori such that p 1 • • • p ℓ = p ′ 1 • • • p ′ ℓ (i.
) 0 = p 1 • • • p ℓ = p ′ 1 • • • p ′ ℓ = (T p ′ 1 ,...,p ′ ℓ ) 0 ),
Diameter. For a given adjacency neighborhood function (co)A r1,r2 , the (r 1 , r 2 )-diameter of a combinatorial complex X is defined by:
diam (co)Ar 1 ,r 2 (X ) = max x,x ′ ∈Xr 1 d (co)Ar 1 ,r 2 (x, x ′ ),(23)
where d (co)Ar 1 ,r 2 is the shortest path distance with respect to neighborhood function (co)A r1,r2 . Additionally, for k ∈ {1, . . . , ℓ}, the (r 1 , r 2 , k) cross diameter is defined by:
diam k (co)Ar 1 ,r 2 (X ) = max x∈Xr 1 y∈X k min x ′ ⊆y d (co)Ar 1 ,r 2 (x, x ′ ).(24)
In this section we show that HOMP is unable to compute diameters of CCs, using ℓ-dimensional tori as a counter example. Corollary B.6 implies that any pair of ℓ-dimensional tori with the same number of nodes (0-cells) is indistinguishable by HOMP, therefore it is enough to construct such tori with different diameters. E.g. the tori T 4,4,32 and T 8,8,8 have the same number of 0-cells but different diameters and cross-diameters for any (co)adjacency function and k = 1, 2, 3. This can be extended to tori of any dimensions. More formally we have the following proposition for the (0, 1)-diameter.
Proposition B.7. If T p1,...,p ℓ and
T p ′ 1 ,...,p ′ ℓ are ℓ-dimensional tori satisfying 1. p 1 • • • p ℓ = p ′ 1 • • • p ′ ℓ , 2. ∀j ∈ {1, . . . , ℓ}, p j , p ′ j ≥ 3, and 3. ℓ j=1 ⌊ pj 2 ⌋ ̸ = ℓ j=1 ⌊ p ′ j 2 ⌋, then diam A0,1 (T p1,...,p ℓ ) ̸ = diam A0,1 (T p ′ 1 ,...,p ′ ℓ )(25)
but for any HOMP model M, M(T p1,...,p ℓ ) = M(T p ′ 1 ,...,p ′ ℓ ).
(26)
Proof. Conditions 1 and 2 imply that T p1,...,p ℓ and T p ′ 1 ,...,p ′ ℓ are indistinguishable by HOMP. To see that they have different diameters, observe that the graph induced on the nodes of T p1,...,p ℓ by the adjacency neighborhood A 0,1 is the Cartesian product of the cyclic graphs Cyc(p 1 ), . . . , Cyc(p ℓ ). Consequently, since the diameter of a Cartesian product is equal to the sum of diameters over the factors of the product, we have:
diam A0,1 (T p1,...,p ℓ ) = ℓ j=1 diam(Cyc(p j )) = ℓ j=1 p j 2 ̸ = ℓ j=1 p ′ j 2 = ℓ j=1 diam(Cyc(p ′ j )) = diam A0,1 (T p ′ 1 ,...,p ′ ℓ ).(27)
Homology and Betti numbers. The r-th homology group of a cellular complex8 encodes the structure of "r-dimensional holes" in the space (e.g. a circle has a single 1-dimensional hole, a sphere has a single 2-dimensional hole, etc). We denote the r-th homology of a CC X by H r (X ).
The rank of the r-th homology group (i.e. the size of the minimal generating set) is called the r-th Betti number, denoted by b r (X ).
Proposition B.8 (HOMP cannot distinguish complexes based on homology). Let T = T p1,...,p ℓ be an ℓ-dimensional torus and
T ′ = T p 1 1 ,...,p 1 ℓ ⊔ T p 2 1 ,...,p 2 ℓ be a disjoint union of two disconnected tori. If p 1 • • • p ℓ = p 1 1 • • • p 1 ℓ + p 2 1 • • • p 2
ℓ and ∀j ∈ {1, . . . , ℓ}, p j , p 1 j , p 2 j ≥ 3, then T and T ′ are HOMP-indistinguishable but have different homology groups and Betti number of all orders: ∀r ∈ {0, . . . , ℓ}, H r (T )
̸ = H r (T ′ ), b r (T ) ̸ = b r (T ′ ).
Proof. First, Lemma B.5 implies that the T, T 1 , and T 2 have a common cover. Thus, since T and T ′ have the same number of cells, Theorem B.1 implies they are HOMP-indistinguishable. Additionally, for every H r (T ) = Z ( ℓ r ) (see e.g. Hatcher (2002)) and since, T ′ is a disjoint union of T 1 and T
2 , H r (T ′ ) = H r (T 1 ) × H r (T 2 ) = Z ( ℓ r ) × Z ( ℓ r ) = Z 2( ℓ r ) . Therefore, ∀r ∈ {0, . . . , ℓ}, H r (T ) ̸ = H r (T ′ ) and b r (T ) = ℓ r ̸ = 2 ℓ r = b r (T ′ ).
Orientability. We now turn our attention to HOMP's capability to to detect another common topological property: orientability. Loosely speaking, a surface is orientable if one can distinguish between an "inner" and an "outer" side of the surface. A common example of two locally isomorphic surfaces where one is orientable and the other is not is the Möbius strip and a cylinder. For an indepth discussion about orientability and the Möbius strip see Hatcher (2002). We now realize both of these surfaces as cellular complexes. A visualization of the construction can be seen in Figure 1(b). We begin by defining two auxiliary functions.
Definition B.9. For h, p ∈ N define ρ h,p cyl , ρ h,p möb :
Z 2 → Z 2 by ρ h,p cyl (s) = (s 1 , s 2 mod p) (28) ρ h,p möb (s) = s 1 , s 2 mod r s 2 mod 2p ≤ p (h + 1 -s 1 , s 2 mod r) s 2 mod 2p > p. (29
)
Using ρ h,p cyl and ρ h,p möb we can costurct the cylinder and the Möbius strip. Definition B.10 (Cylinder as CC). Given two integers h, p, the cylinder Cyl h,p is a 2-dimensional combinatorial complex (S, X , rk) defined by:
S = [h] × [p],(30)
X r = {s k | s ∈ S, k ∈ {0, 1} 2 , k 1 + k 2 = r, ρ h,p cyl (s + k) ∈ S},(31)
X = X 0 ∪ X 1 ∪ X 2 , (32
)
where s k is defined by:
s k = {ρ h,p cyl (s + k ′ ) | k ′ ∈ {0, 1} 2 , k ′ ≤ k}. (33
)
Definition B.11 (Möbius strip as a CC). Given two integers h, p, the Möbius strip Möb h,p is a 2-dimensional combinatorial complex (S, X , rk) defined by:
S = [h] × [p],(34)
X r = {s k | s ∈ S, k ∈ {0, 1} 2 , k 1 + k 2 = r, ρ h,p möb (s + k) ∈ S}, (35) X = X 0 ∪ X 1 ∪ X 2 ,(36)
where s k is defined by:
s k = {ρ h,p möb (s + k) | k ′ ∈ {0, 1} 2 , k ′ ≤ k}. (37
)
We now show HOMP is unable to distinguish between CCs based on orientability:
Proposition B.12 (HOMP cannot detect orientability). For any two integers h, p ∈ N such that h, p ≥ 3, and for every HOMP model M , Cyl h,p and Möb h,p are HOMP-indistinguishable, but Cyl h,p is orientable as a topological space while Möb h,r is not.
Proof. First, the fact that the cylinder is orientable, whereas the Möbius strip is not is well known (see e.g. Hatcher (2002) for proof). As for HOMP-indistinguishably, consider the wide cylinder Cyl h,2p with height h and perimeter 2p. We show that Cyl h,2p covers both Cyl h,p and Möb h,p . Since the two CCs are connected and have the same number of nodes, Theorem B.1 implies that they are HOMP-indistinguishable. Denote by S, S cyl , S möb and X , X cyl , X möb the sets of nodes and cells of Cyl h,2p , Cyl h,p and Möb h,p respectively. Define ρ : S → S cyl and ρ ′ : S → S möb by ρ = ρ h,p cyl S and ρ ′ = ρ h,p möb S . It's easy to verify that ρ( S) = S cyl and ρ ′ ( S) = S möb , thus ρ and ρ ′ are well defined and surjective. ρ, ρ ′ induce maps P( S) → P(S cyl ) and P( S) → P(S möb ); by abuse of notation we refer to these maps by ρ, ρ ′ as well. To show that ρ and ρ ′ are covering maps, we first show that they are rank-preserving (i.e. that ρ( Xr ) = X cyl r and ρ ′ ( Xr ) = X möb r ), and then show that they are local isomorphisms. Recall that all elements of Xr are of the form sk for some s ∈ S and k ∈ {0, 1} 2 such that
k 1 + k 2 = r. For every k ′ ≤ k ρ(ρ cyl h,2p (s + k ′ )) = ρ cyl h,p (ρ(s) + k ′ ), (38
) so ρ(s k ) = ρ(s) k . Additionally, ρ ′ (ρ h,2p cyl (s + k ′ )) = ρ h,p möb (ρ ′ (s) + k ′ ) s1 ≤ p ρ h,p möb (ρ ′ (s) + (-k ′ 1 , k ′ 2 )) s1 > p. (39
) so ρ ′ (s k ) = ρ ′ (s) k s1 ≤ p (ρ ′ (s) + (-1, 0)) k s1 > p. (40
)
By the definitions Xr , X cyl and X möb we now have ρ( Xr ) = X cyl r and ρ ′ ( Xr ) = X möb r as needed. Since ρ and ρ ′ are extended to P( S) from S, for every x, y, z ∈ X
• x ⊆ y ⇒ ρ(x) ⊆ ρ(y) and ρ ′ (x) ⊆ ρ ′ (y). • x, y ⊆ z ⇒ ρ(x), ρ(y) ⊆ ρ(z) and ρ ′ (x), ρ ′ (y) ⊆ ρ ′ (z) • z ⊆ x, y ⇒ ρ(z) ⊆ ρ(x), ρ(y) and ρ ′ (z) ⊆ ρ ′ (x), ρ ′ (y).
Therefore, ρ and ρ ′ preserve all natural neighborhood functions. Finally, since h, p ≥ 3, for x, y ∈ X and N ∈ N nat , y ∈ N (x) ⇒ ρ(x) ̸ = ρ(y) and ρ ′ (x) ̸ = ρ ′ (y). This implies that ρ and ρ ′ are local isomorphisms, completing the proof.
Planarity. A topological space is considered planar if it can be continuously embedded in R 2 . Proposition B.12 provides us with the following corollary.

Section: Corollary B.13 (HOMP cannot detect planarity).
There exist pairs of cellular complexes X , X ′ such that the induced topology of X is planar while the induced topology of X ′ is not, but X and X ′ are HOMP-indistinguishable.
Proof. The CCs Cyl h,p and Möb h,p for p, h ≥ are HOMP-indistinguishable according to Proposition B.12. The Möbius strip is not planar (see e.g., Hatcher (2002)), whereas the cylinder is.

Section: B.3 LIFTING AND POOLING
In this section, we rigorously state and prove Proposition 4.4. We begin by focusing on lifting operations, proving Proposition 4.4 for triangular lifting, as used in Bodnar et al. (2021b) and Bodnar et al. (2021a). Next, we address pooling operations, proving the proposition for MOG pooling (Hajij et al., 2018), which was used to in conjunction with HOMP in Hajij et al. (2022b). While we only provide proofs for these triangular lifting and MOG, we note that this phenomenon generalizes to other lifting and pooling methods as well.
Lifting. We first define triangular lifting on graphs, denoted by 3-CL.

Section: Definition B.14 (Triangular lifting). The triangular lift of a graph
G = (V, E) is a combinatorial complex denoted by 3 -CL(G), with S = V, X 0 = {{ v} | v ∈ V}, X 1 = E, and X 2 = {{x, y, z} | x ∼ y, x ∼ z, y ∼ z}.
We now formally state Proposition 4.4 for triangular lifting Proposition B.15. There exist pairs of graphs G and G ′ such that the combinatorial complexes X = 3-CL(G) and 3X ′ = 3-CL(G ′ ) are indistinguishable by HOMP. This occurs despite the fact that the cross diameter diam 2 A0,1 (X ) is finite while diam 2 A0,1 (X ′ ) is infinite. Since n • k > 3, the only triangles in G and G ′ are of the form {b i , a i•n , a i•n+1 }. Denote the combinatorial complexes constructed from G and G ′ by applying triangular lifting as X and X ′ , respectively. Additionally, denote 3-CL(Star n,k ) by X * . Since X ′ consists of two disconnected copies of X * , and the complexes X ′ and X are of equal size, Theorem B.1 implies that in order to show HOMP cannot distinguish between X and X ′ , it suffices to show that X is a cover of X * . Letting S ad S * be the node sets corresponding to CCs X , X * , we construct a covering map ρ : S → S * defined by:
ρ(a i ) = a ′ i mod n•k ρ(b i ) = b ′ i mod k .(41)
Note that ρ induces a map from P(S) to P(S * ) where P(•) denotes the power set. We abuse notation and refer to this function by ρ as well. We notice that ρ is surjective, and that for any pair of nodes u, v in graph G we have:
u ∼ G v ⇒ ρ(u) ∼ G * ρ(v).(42)
This implies that ρ preserves triangles as well. In addition, since n • k > 3 ρ is locally injective. Thus, ρ is a covering map, and X and X ′ are indistinguisable by HOMP. Finally, it is evident that diam 2 A0,1 (X ′ ) = ∞ since it consists of two disjoint connected components, each containing a non-empty set of nodes (0-cells) and triangles (2-cells). Conversely, diam 2 A0,1 (X ) < ∞ because it consists of a single connected component.  2016), a topology preserving pooling algorithm which was previously used in combination with HOMP in Hajij et al. (2022b). We now define mapper on graphs (MOG), a pooling procedure that takes a general graph as input and produces a 2-dimensional combinatorial complex.

Section: Definition B.16 (Mapper on graphs)
. Let G = (V, E) be a graph, g : V → R be a node function, and U = {U α } α∈I an open covering of R. The MOG pooling of the graph, MOG(G) = (S, X , rk) is given by the following consturction.
1. Compute the pull-back cover g * (U) = {g -1 (U α )} α∈I .
2. Construct V MOG to be the set connected components of the sub-graphs induced by g * (U).
3. Construct the pooled CC to be (S, X , rk) with nodes S = V, cells X = V ∪ E ∪ V MOG , and rank
rk(x) =    0 x ∈ V 1 x ∈ E 2 x ∈ V MOG .
are Cyc(n) isomorphic, all nodes in P i,n are G n isomorphic. The same holds for P ′ 1,n = P ′ 1 × V n , P ′ 2,n = P ′ 2 × V n . Thus, again, the function g , defined as the average shortest path distance of each node, is constant on each of these sets. This implies that by choosing a sufficiently fine covering, the 2-cells defined by the MOG algorithm for graphs G n and G ′ n will be
x i,n = x i × V n and x ′ i,n = x ′ i × V n respectively for all i ∈ [3]. Defining X n = MOG(G n ), X ′ n = MOG(G ′ n )
we now aim to show that HOMP cannot distinguish between these two CCs.
We define G to be the 1-skeleton of X , Gn = G × Cyc(n) and Xn to be the CC whose 1-skeleton is Gn and whose 2-cells are: {x × V n | x ∈ X2 }. Let ρ : S → S, ρ ′ : S → S ′ be the covering maps from X to X and X ′ respectively, as depicted in Figure 11. Define
ρ n : S × V n → S × V n by ρ n (s, v) = (ρ(s), v) and ρ ′ n : S × V n → S ′ × V n by ρ ′ n (s, v) = (ρ ′ (s), v)
. By the definitions of X n , X ′ n and Xn , these two maps are covering maps from Xn to X n , X ′ n respectively. Since X n , X ′ n are connected and are of the same size Theorem B.1 implies they are indistinguishable by HOMP.
Secondly Since G n is a graph cartesian product, for every (s, v), (s ′ , v ′ ) ∈ S × ′ gV n we have:
d Gn ((s, v), (s ′ , v ′ )) = d G (s, s ′ ) + d Cyc(n) (v, v ′ ). (43
)
Thus, it is easy to check that:
diam 2 A0,1 (X n ) = diam 2 A0,1 (X ) = 3. The same reasoning shows that diam 2 A0,1 (X ′ n ) = diam 2 A0,1 (X ′ ) = 2 concluding the proof.

Section: C MULTI-CELLULAR NETWORKS
In this section we motivate and formally define MCN, expanding the discussion in Section 5. We rigorously define both the equivariant linear updates and the general tensor diagram forward pass.
Multi-cellular cochain spaces As discussed in Section 5, given an ℓ-dimensional CC X and an (ℓ + 1)-tuple k ∈ N ℓ+1 , the space of k-multi-cellular cochains is defied by:
C k (X , R d ) = {h k | h k : X k0 0 × • • • × X k ℓ ℓ → R d }.(44)
Multi-cellular cochain spaces are a natural generalization of standard cochain spaces, providing a way to represent diverse types of CC data. For example, when k = e i , we get that C ei (X , R d ) is the space of function h ei :
X i → R d , i.e. C k (X , R d
) is the space of all possible i-rank cell features (i.e. C ei ∼ = C i ). In addition, for any pair of integers r 1 , r 2 , the incidence neighborhood function B r1,r2 can be encoded as the map h : X r1 × X r2 → R d defined by:
h(x, y) = 1 y ∈ B r1,r2 (x) 0 otherwise. (45
)
Thus B r1,r2 can be regarded as an element of the space C er 1 +er 2 . Finally, neighborhood functions of the type (co)A r1,r2 can be similarly encoded as the map h : X 2 r1 → R d defined by:
h(x, y) = 1 y ∈ (co)A r1,r2 (x) 0 otherwise. (46
)
Thus (co)A r1,r2 can be regarded as an element of the space C 2•er 1 .
Multi-cellular cochain spaces recover many linear spaces studied in several previous works. For example, C ker matches the features space of a k-IGN layer operating on r-cells, and C er 1 +er 2 corresponds to the input space of the exchangeable matrix layers introduced in Hartford et al. (2018).
Equivariant linear maps between multi-cellular cochain spaces. Let X be a combinatorial complex. We define n r = |X r |, representing the size of X r . For a tuple k = (k 0 , . . . , k ℓ ), we define the product space
X k = X k0 0 × • • • × X k ℓ ℓ . The group S n0 × • • • × S n ℓ is denoted by G.
We aim to find a basis for the space of equivariant linear layers L :
C k (X , R d ) → C k ′ (X , R d ′ ) for each pair of tuples k, k ′ . Since the space C k can be identified with R n k 0 0 ×•••×n k ℓ
ℓ ×d , the space of linear maps between the two can be considered as the space of matrices:
C k ⊗ C k ′ = R n k 0 0 ×•••×n k ℓ ℓ ×d× k ′ 0 0 ×•••×n k ′ ℓ ℓ ×d ′ . (47
)
where we use ⊗ to denote the tensor product of vector spaces. The index set corresponding to this space of matrices is
X k × [d] × X k ′ × [d ′ ].
Notice that the group G acts naturally on X k × X k ′ . For each G-orbit γ and integers
j 1 ∈ [d 1 ], j 2 ∈ [d 2 ], we define a matrix B γ,j1,j2 ∈ R n k 0 0 ×•••×n k ℓ ℓ ×d× k ′ 0 0 ×•••×n k ′ ℓ ℓ ×d ′ : B γ,j1,j2 a,i1,b,i2 = 1 (a, b) ∈ γ, i 1 = j 1 , i 2 = j 2 0 otherwise. (48
)
Here a
∈ [n 0 ] k0 × • • • × [n l ] k l , b ∈ [n 0 ] k ′ 0 × • • • × [n l ] k ′ l i 1 ∈ [d], i 2 ∈ [d ′ ].
If h ∈ C k and h ′ = B γ,j1,j2 h we have:
h ′ (b) j = (a,b)∈γ h(a) j1 j = j 2 0 otherwhise. (49
)
where here by abuse of notation a, b were used interchangeably to describe multi-indices and multicells. These maps were established as a basis for the space of all equivariant linear functions L : 2018) and thus can be used to characterize the space of equivariant linear layers L :
R n k 0 0 ×•••×n k ℓ ℓ ×d → R n k ′ 0 0 ×•••×n k ′ ℓ ℓ ×d in Maron et al. (
C k → C k ′
. This framework encompasses many layers from previously studied models. For example, by setting k = k ′ = k • e r , the space of equivariant linear layers corresponds to the space of k-IGN layers, which take as input graphs defined on the k-rank cells of the input complexes (i.e. graphs whose node sets are X r ). Similarly, with k = k ′ = e r + e r ′ for r ̸ = r ′ ∈ N, this space aligns with the space of linear maps used to construct the exchangeable matrix layer as described in Hartford et al. (2018).
We can now use this basis to construct learnable equivariant linear layers by taking a parametric linear combination of the basis functions:
F (h) = β   γ,j1,j2 w γ,j1,j2 B γ,j1,j2 h   . (50
)
To construct MCN, We augment HOMP architectures with these equivariant layers. This is embodied by incorporating equivariant linear layers into the tensor diagram scheme. Figure 5 depicts a MCN tensor diagram. We now describe the components of the MCN scheme.
Diagram. Similar to HOMP tensor diagrams, MCN tensor diagrams are layered directed graphs with labeled nodes and edges. Each node is labeled by a multi-cellular cochain space, extending the class of node labels used in HOMP tensor diagrams. Directed edges with source and target nodes labeled by C er can be labeled by any neighborhood function, while edges between nodes labeled by other types of multi-cellular cochain spaces are labeled with the new label "equiv".
Input. The input to the MCN model is determined by the 0-th layer of the tensor diagram, whose nodes can be labeled by the following types of multi-cellular cochain spaces: (1) nodes labeled by C er which take the r-rank cell features as input;
(2) Nodes labeled by C er 1 +er 2 which take the matrix form of the incidence neighborhood B r1,r2 as input;
(3) Nodes labeled by C 2er which take the matrix form of the (co)adjacency matrices (co)A r,r ′ .

Section: Update. At each layer of an
MCN tensor diagram, if v is a node labeled by C k we compute a multi-cellular cochain h (v) ∈ C k by h (v) x = u∈pred(v) m u,v (x)(51)
where x ∈ C k and pred(v) denotes the set of predecessor nodes in the diagram. Here messages m u,v ∈ C k are computed based on the label of the edge (u, v). If the edge is labeled by a neighborhood function N , (in which case v and u are labeled by standard cochain spaces), the message m u,v is computed by
y∈N (x) MLP u,v (h (u) x , h (u) y ). (52
) of O(d • n 0 • n 1 ) and O(d • n 0 • n 2 ).
As we demonstrate in Proposition 6.6, SMCN architectures with runtime O(d • n 0 • n 2 ) are still strictly more expressive than HOMP. This is useful, since in most natural cases n 2 ≪ n 0 . This allows for flexibility in trading off computational complexity and expressive power.

Section: E MCN EXPRESSIVE POWER
In this section, we analyze the expressive power of MCN defined in Section 6. We begin by formally defining CC isomorphism, as described in Hajij et al. (2022b).
Definition E.1 (CC isomorphism). A pair of CCs (S, X , rk), (S ′ , X ′ , rk ′ ) are isomorphic if there exists a bijective map ρ : X → X ′ such that:
1. rk(x) = rk ′ (ρ(x)) ∀x ∈ X , 2. x ⊆ y ⇒ ρ(x) ⊆ ρ(y) ∀x, y ∈ X .
Is there exists an isomorphism ρ : X → X ′ we say that X and X ′ are isomorphic; if such an isomorphism does not exist, we say that X and X ′ are non-isomorphic.
Proposition E.2 (MCN expressive power). If X and X ′ are non-isomorphic there exists an MCN model M such that
M(X ) ̸ = M(X ′ ). (56
)
Proof. First, let H = (V, E) and H ′ = (V ′ , E ′ ) be the Hasse graphs of X and X * respectively, defined by A can be decomposed into block matrices A r1,r2 for r 1 , r 2 ∈ {0, . . . , ℓ} defined by:
V = X , (57) V ′ = X ′ , (58) E = {(x, y) ∈ X × X | x ⊆ y, rk(x) = rk(y) -1},(59)
E ′ = {(x ′ , y ′ ) ∈ X ′ × X ′ | x ′ ⊆ y ′ , rk ′ (x ′ ) = rk ′ (y ′ ) -1}.(60
A r1,r2 = 0 ni×nj r 1 ̸ = r 2 + 1 B r1,r2 r 1 = r 2 + 1,(61)
where B r1,r2 is the matrix form of neighborhood function B r1,r2 . matrices A r1,r2 can be view as a multi-cellular cochains h r1,r2 ∈ C er 1 +er 2 (X , R) so A can be realized as an element of
Q := × ℓ r1=0,r2=0 C er 1 +er 2 (X , R).
Recall that all neighborhood matrices B r1,r2 are given as input to the MCN model and so we can recover A. To show that MCN can simulate any k-IGN update on A, we need to show that it can compute L(A) for any S n -equivariant linear function L :
Q ⊗k → Q ⊗k ′
, where Q ⊗k represents taking the tensor product of Q with itself k times. Let G < S n be the subgroup of permutations preserving the subsets {1, . . . , n 0 }, {n 0 + 1, . . . , n 0 + n 1 }, . . . , {n 0 +
• • • + n ℓ-1 + 1, . . . , n 0 + • • • + n ℓ }; G ∼ = S n0 × • • • × S n ℓ ⊆ [n].
Since G is a subgroup of S n , all S n equivariant linear maps are also G-equivariant. Thus it is enough to show that we are can compute L(h) for all G-equivariant linear maps L :
Q ⊗k → Q ⊗k ′ .
The space Q = × ℓ r1=0,r2=0 C er 1 +er 2 (X , R) can be embedded into the multi-cellular cochain space C 1 ℓ+1 (X , R (ℓ+1) 2 ) via the following map:
T (h)(x 0 , . . . x ℓ ) = ℓ r1=0,r2=0 h r1,r2 (x r1 , x r2 ),(62)
where ∥ stands for concatenation, 1 ℓ+1 = (1, . . . , 1) ∈ R ℓ+1 is the all ones vector, x r ∈ X r is a cell of rank r and h ∈ Q composed of the multi-cellular cochains h r1,r2 ∈ C er 1 +er 2 (X , R). MCN can use any linear function L : ℓ+1) 2 ) which is G-equivariant, and so it can compute L(h) for all linear maps as defined above, concluding the proof.
C k•1 ℓ+1 (X , R (ℓ+1) 2 ) → C k ′ •1 ℓ+1 (X , R(

Section: F SMCN EXPRESSIVE POWER
F.1 TOPOLOGICAL AND METRIC PROPERTIES In this section, we formally demonstrate the SMCN's ability to mitigate many of the expressive limitations demonstrated in Appendix B. We begin by providing a useful lemma that allows us to leverage several expressivity results from the subgraph GNN literature in our setting. We then provide an in-depth discussion on the ability of SMCN to express each one of the four aforementioned metric/topological properties: diameter, orientability, planarity, and homology.
Lemma F.1. For any CS-GNN (Bar-Shalom et al., 2024) model M operating on the Hasse graph H (co)Ar 1 ,r 2 using cells of rank r ≥ r 1 as super-nodes, there exits an SMCN model M ′ , such that for any CC
X of dimension ≥ r 1 , r 2 , r, M(H (co)Ar 1 ,r 2 ) = M ′ (X ).
Proof. First, note that the incidence matrix B r1,r ∈ C er 1 +er is equivalent to the "simple node marking" defined in Bar-Shalom et al. (2024), so SMCN can recover the input to the CS-GNN architecture. Second, by taking
MLP r,r ′ (x, y) = MLP(x, y) if r = r 2 and r ′ = r 1 , 0 otherwise(63)
for some fixed MLP, Equation 55 becomes identical to the CS-GNN update.
Remark F.2. For the case where r = r 1 (i.e. super-nodes are regular Hasse graph nodes) the CS-GNN architecture becomes equivalent to GNN-SSWL+ (Zhang et al., 2023b).
Diameter. We first show SMCN is capable of fully leveraging the information provided by the (cross) diameters of an input CC. see Appendix B for a definition.
Proposition F.3 (SMCN can compute diameters). If X , X ′ are CCs such that diam r Ar 1 ,r 2 (X ) ̸ = diam r Ar 1 ,r 2 (X ′ ),(64)
for r 1 , r 2 , r ∈ N with r 1 ≤ r, then there exists an SMCN model M such that M(X ) ̸ = M(X ′ )
Proof. In Zhang et al. (2023b), it was shown that GNN-SSWL+, with standard node marking applied to a graph G = (V, E), can compute a final feature representation:
h (T ) u,v = d G (u, v) for u, v ∈ V.(65)
By taking the maximum over h
(T )
u,v , GNN-SSWL+ can distinguish between graphs with different diameters. Similarly, It was shown in Bar-Shalom et al. (2024) that CS-GNN with standard node marking applied to a graph G = (V, E) and super-node set V * can compute a final feature representation h
(T ) S,v = d G (S, v) for v ∈ V and S ∈ V * .(66)
By taking the maximum over h
(T )
S,v , CS-GNN with standard node marking can distinguish between graphs with different cross diameters. Thus, applying Lemma F.1 and Remark F.2 on the Hasse graph H Ar 1 ,r 2 with X r as "super-nodes" we get that SMCN can distinguish between CCs with different (cross) diameters.
Orientability and planarity. We now show SMCN is able to separate the cylinder and the Möbius strip. This implies that SMCN is strictly better than HOMP at detecting planarity and orientability. Understanding SMCN's ability to fully detect orientability or planarity is still and open question and is left for future work.
Proposition F.4 (SMCN can separate a cylinder and a Möbius strip). For any two integers h, p ∈ N such that h, p ≥ 3, there exists an SMCN model M, such that:
M(Cyl h,p ) ̸ = M(Möb h,p ).(67)
Proof. First, using the terms "edge" and "1-cell" interchangeably, we define two types of edges on Cyl h,p and Möb h,p . An edge x ∈ X 1 is called an interior edge if |B 1,2 (x)| > 1, otherwise it's called a boundary edge. We denote the boundary edge graph (node set are the nodes contained in the boundary edges and edge set is the boundary edges themselves) of a CC X by ∂X . We construct the model M by first using a B 1,2 aggregation to get the cochain h (1) ∈ C e1 (X , R)
h (1) (x) = deg B1,2 (x).(68)
Next, we use an equivariant linear update to construct the multi-cellular cochain h (2) ∈ C 2e1 (X , R 2 ) defined by:
h (2) x1,x2 = deg B1,2 (x 1 ) ∥ deg B1,2 (x 2 ),(69)
where, ∥ denotes concatenation. Recall that the matrix form of coA 1,0 defines a cochain h coA1,0 ∈ C 2e1 which can be used as input to SMCN. Using h coA1,0 can now construct
h (3) x1,x2 = (h coA1,0 ) x1,x2 ∥ deg B1,2 (x 1 ) ∥ deg B1,2 (x 2 ).(70)
Finally, using a stack of equivariant linear layers, we can construct a fourth cochain h
(4) x1,x2 = MLP(h (3) x1,x2
). We use the Memorization Theorem (Yun et al., 2019), and choose MLP that satisfies
MLP(a, b, c) = 1 a = b = c = 1 0 otherwise.(71)
h (4) represents the adjacency matrix of ∂X . ∂Cyl h,p is composed of two disconnected cycles of length p; ∂Möb h,p is composed of a single cycle of length 2p. These two graphs are distinguishable by subgraph architectures like GNN-SSWL+. Thus, using Lemma F.1 and Remark F.2 we can continue the construction of M so that it will be able to differentiate between Cyl h,p and Möb h,p .
Homology. We first show that SMCN is able to count the number of connected components i.e. the 0-th homology.
Proposition F.5 (SMCN can count connected components). Let X , X ′ be CCs. If the number of connected components of the augmented Hasse graphs H Ar 1 ,r 2 and H ′ Ar 1 ,r 2 is different for some r 1 , r 2 ∈ N then there exists an SMCN model M such that M(X ) ̸ = M(X ′ ).
Proof. For a graph G, C(G) represents the set of connected components of G, and G v denotes the connected component of a node v ∈ V. Using Lemma F.1 and Remark F.2, it suffices to show that GNN-SSWL+ can distinguish graphs with different numbers of connected components. It was shown in Zhang et al. (2023b) that adding an additional aggregation of the form h u,v → v ′ ∈V h u,v ′ to GNN-SSWL+ does not affect its capacity to separate graphs. Therefore, for the remainder of this proof, we include this aggregation in GNN-SSWL+. As previously demonstrated, GNN-SSWL+ can compute a feature vector of the form:
h (t) u,v = d G (u, v) for u, v ∈ V.(72)
If u and v are in different connected components, their distance is encoded as -1. Let g 1 : [-1, |V|] → R be a continuous function such that:
g 1 (x) = 0 if x = -1, 1 if x ≥ -1 2 .(73)
We can approximate g 1 using an MLP and apply it to h
(t)
u,v to obtain:
h (t+1) u,v = 0 if v / ∈ G u , 1 if v ∈ G u .(74)
We now take h
(t+2) u,v = v ′ ∈V h (t+1) u,v ′ , to get h (t+2) u,v = |G u |. (75
) Define g 2 [1, |V|] → R to be g 2 (x) = 1 x .(76)
We can approximate g 2 using an MLP and apply it to h
(t+2) u,v
to obtain
h (t+3) u,v = 1 |G u | . (77
)
It was shown in Zhang et al. (2023b) that the final output of a GNN-SSWL+ model can be computed based on the final feature vector h
(T ) u,v by h out = u,v∈V h (T ) u,v .(78)
Applying this to h
u,v , we get
h out = u,v∈V 1 |G u | = G * ∈C(G) u∈G * |V| |G * | = G * ∈C(G) |V| = |V||C(G)|.(79)
Now let G, G ′ be a pair of graphs with a different number of connected components. If these two graphs have a different number of nodes, they can be easily distinguished by GNN-SSWL+. On the other hand, if they have the same number of nodes they can be distinguished by GNN-SSWL+ based on Equation 79. Thus, we have shown that an augmented GNN-SSWL+ model can distinguish between H Ar 1 ,r 2 and H ′ Ar 1 ,r 2
, and therefore, there exists an SMCN model M that can separate X and X ′ .
Since the 0-th homology satisfies H 0 (X ) = Z |C(X )| we additionally get the following corollary.
Corollary F.6 (SMCN can compute the 0-th homology). If X , X ′ are CCs such that the 0-th homology group of their induced topological spaces are different, then there exists an SMCN model M such that M(X ) ̸ = M(X * ).
Exploring SMCN's capacity to differentiate between CCs based on their higher-order homology groups is left for future work. As a first step, we show that SMCN can successfully separate a natural family of CCs -two-dimensional surfaces embeddable in R 3 -based on any homology group/Betti number.
Proposition F.7 (SMCN can compute homology groups of surfaces). Let X , X ′ be two cellular complexes that are realizations of 2-dimensional manifolds (with or without boundary) M, M ′ which are embeddable in R 3 . If ∃r ∈ N such that H r (M) ̸ = H r (M ′ ) then there is an SMCN model M such that M(X ) ̸ = M(X ′ ).
Proof. First, since M is 2-dimensional, the only non-trivial homology groups it may have are of order 0 ≤ r ≤ 2. The 0-th homology group of M, is of the form H 0 (M) = Z k0 where k 0 is the number of M's connected components. Furthermore, as each connected component of M is a connected 2-dimensional manifold with a boundary that can be embedded in R 3 , it must either be orientable or have a non-empty boundary. If such a component is orientable, then by the Poincaré duality, its second homology group is Z. On the other hand, if it has a boundary, it is homotopic to a 1-dimensional cellular complex, and thus its second homology group is trivial. Therefor, H 2 (M) = Z k2 , where k 2 is the number of connected components of M with no boundary. Finally, since M is embeddable in R 3 , its 1-st homology groups is H 1 (M) = Z k1 for some integer k 1 . The Euler characteristic of the manifold M defined by k 0 -k 1 + k 2 can be computed using the number of cells of X using the following formula:
χ(M) = k 2 -k 1 + k 0 = |X 2 | -|X 1 | + |X 0 |. (80
)
Thus in order to separate X from X ′ we need to be able to construct a SMCN model that is able to separate CCs that are different in either one of the following three quantities:
1. The Euler characteristic.
2. The number of connected components.
3. The number of connected components with no boundary.
Computing the Euler characteristic is be computed by standard HOMP updates, as it is a function of the sizes of X 0 , X 1 , and X 2 . For the second quantity, we have seen SMCN models can separate CCs with a different number of connected components in Proposition F.5. As for the third quantity, a connected component of X has a boundary if and only if it contains 1-cells whose degree with respect to the neighborhood function B 1,2 is exactly 1. We can use a stack of standard HOMP layers to compute the 1-cells features Proof. In Proposition B.15, we find a family of graph pairs whose triangular lifts are indistinguishable by HOMP despite having different diameters of type diam 2 A0,1 . In Proposition F.3 we saw that using only SCL updates, SMCN can compute this type of diameter and is thus able to distinguish between the aforementioned pairs of CCs. Recall the SCL update rule:
h x = 1 x
m u,v (x, y) = ℓ r=0,r ′ =0 MLP r,r ′ h (t) x,y , h(t)
(co)Ar 1 ,r (x),y , h
x,(co)A r 2 ,r ′ (y) , h x,Br 1 ,r 2 (x) , h B ⊤ r 2 ,r 1 (y),y , where if Q 1 ⊆ X r1 and Q 2 ⊆ X r2 are sets of cells, h Q1,y := x ′ ∈Q1 h x ′ ,y and h x,Q2 := y ′ ∈Q2 h x,y ′ . Observing the proof of Proposition F.3, we note that in order to be able to compute diam 2 A0,1 . It is enough to use only values corresponding to r = r ′ = 1, r 1 = 0, r 2 = 2. the asymptotic runtime of this type of SCL layer is
O(m deg • n 0 • n 2 ). Thus the overall runtime of our SMCN model is O(m deg • n 0 • n 2 • T ), completing the proof.
We now move on to MOG pooling. Proposition F.9. There exist pairs of graphs G and G ′ such that the combinatorial complexes X = MOG(G) and X ′ = MOG(G ′ ) are indistinguishable by HOMP, but can be distinguished by an SMCN model with asymptotic runtime O(d • n 0 • n 2 • T ) where n 0 is the number of nodes in the original graph, n 2 is the number 2-rank cells constructed by mapper ,d is the maximal degree and T is the number of layers.
Proof. In Proposition B.17, we saw a family of graph pairs for which the CCs obtained by MOG pooling are indistinguishable by HOMP despite having different (0, 1, 2) cross diameters. In Proposition F.3 we saw that using only SCL updates, SMCN can compute this type of diameter and is thus able to distinguish between the aforementioned pairs of CCs. As seen in the proof above, by choosing an appropriate aggregation function the runtime of each SCL layer becomes O(d • n 0 • n 2 ) (note that by stacking layers that use this aggregation we are still able to compute diam 2 A0,1 ). Thus the overall runtime of our SMCN model is O(d • n 0 • n 2 • T ), completing the proof. F.3 COMPARING THE EXPRESSIVE POWER OF SMCN WITH BASELINES Section 7 compares the empirical performance of SMCN with several relevant baselines. In this section, we briefly discuss the expressive power of SMCN in comparison to each one of these baselines.
GNNs. In Section 7, we evaluate SMCN against various MPNNs (e.g., GIN, GCN), subgraph networks (e.g., DS-GNN (Bevilacqua et al., 2021), SUN (Frasca et al., 2022), GNN-SSWL+ (Zhang et al., 2023b)), and other expressive GNNs (e.g., PPGN (Maron et al., 2019), PPGN+ (Puny et al., 2023)). It is important to note that since GNNs are designed to process graphs, a meaningful comparison of expressivity with SMCN is only valid when SMCN is applied to lifted graphs (see discussion in Section 4.3. Among all MPNNs and subgraph networks, GNN-SSWL+ is the most expressive. Since SMCN can implement GNN-SSWL+ using the A 0,1 neighborhood function (which corresponds to the original graph), it follows that SMCN is at least as expressive as all the MPNNs and subgraph networks, regardless of the chosen lifting procedure. Furthermore, SMCN can implement edge deletion subgraph policies, which, as demonstrated in Bevilacqua et al. (2021), are capable of distinguishing certain instances of graphs that are indistinguishable by the 3-WL test. Since the expressivity of GNN-SSWL+ has been shown in Zhang et al. (2023b) to be bounded by the 3-WL test, it follows that SMCN is strictly more expressive than all the aforementioned MPNNs and subgraph networks. Finally, the expressive power of PPGN and PPGN+ has also been shown to be bounded by the 3-WL test (Maron et al., 2019;Puny et al., 2023), indicating that there are instances of graphs distinguishable by SMCN but not by PPGN or PPGN+. A comprehensive comparison of the expressive power of PPGN and SMCN is deferred to future work.
Topological neural networks. We compare SMCN with four topological neural networks: CIN (Bodnar et al., 2021a), CIN++ (Giusti et al., 2023), CIN + CycleNet (Yan et al., 2024) and Cellular Transformer (Ballester et al., 2024). SMCN can directly implement CIN and CIN++ through the use of suitable tensor diagrams, establishing it as at least as expressive as these architectures. Furthermore, CIN and CIN++ are HOMP-based architectures. As demonstrate in Section 6.1, SMCN can distinguish between CCs that HOMP architectures cannot, making it strictly more expressive.
Additionally, CycleNet processes graphs by applying BasisNet + spectral embedding (Lim et al., 2022) to the 1-Hodge Laplacian of the input graph. The resulting output is then used as edge features, which are subsequently processed by a final CIN applied to the graph lifted to a cellular complex by cyclic lift. In cases where the input graphs are simple and undirected, the 1-Hodge Laplacian is exactly equal to the coadjacency matrix coA 0,1 (plus a diagonal term of 2I which can be ignored). SMCN can apply subgraph GNNs to the coA 0,1 Hasse graph, and use the resulting features as inputs for a CIN model. This reduces our proof to demonstrating that subgraph GNNs are more expressive than BasisNet. As recently shown in Zhang et al. (2024), BasisNet + spectral embedding is strictly less expressive than PSWL, a subclass of subgraph GNNs that is itself less expressive than the base subgraph GNN used in SMCN. This shows that SMCN is strictly more expressive that CIN + CycleNet.
Finally, the Cellular Transformer breaks equivariance with respect to G = S n1 × • • • × S n ℓ using Laplacian positional encoding, allowing it to gain expressivity. Architectures that break equivariance are often fully expressive, but they can also incorrectly differentiate between instances of isomorphic CCs. Consequently, the Cellular Transformer is not a fair comparison to SMCN in terms of expressivity.

Section: G EXPERIMENTAL DETAILS
Setup. All models are implemented in PyTorch (Paszke et al., 2019) using the PyTorch Geometric framework (Fey & Lenssen, 2019). We used TopoNetX (Hajij et al., 2024) and NetworkX (Hagberg et al., 2008) to perform lifting operations. Hyperparameter tuning is carried out using Weights and Biases (Biewald, 2020). All experiments were conducted on a single NVIDIA A100-SXM4-40GB GPU. For each experiment, we report the mean and standard deviation over 5 runs with random seeds from 1 to 5. Reported test scores are computed at the epoch achieving the best score.

Section: G.1 MODEL IMPLEMENTATION
The SMCN framework is highly flexible, allowing for a wide range of model construction approaches. To narrow down the search space, we focus on two types of tensor diagrams, sequential and parallel, each composed of smaller blocks and updates formally defined below. As in the main body, we sometimes omit the subscript in h r when the rank is clear from context. We also use the notation h Q = x∈Q h x . Additionally, we use the augmented concatenation operator Cat(x ∥ y) := MLP 1 (MLP 2 (x) ∥ MLP 3 (y)). All MLPs have a single hidden layer.
C 0 C 1 C 2 A0,1 B0,1 A1,2 B1,2 Id Figure 12: CIN Block.
Initialization. In the case X is lifted from a graph with node and edge features, these features are used as the 0 and 1 cochains respectively. Otherwise, the 0, and 1 cochains are initialized with zeros.
In case the initial features are categorical we apply an embedding layer. Similarly to Bodnar et al. (2021a), we initialize the 2-cell cochain by
(h (0) 2 ) x = (h (0) 0 ) B ⊤ 2,0 (x) = x ′ ∈B ⊤ 2,0 (x) (h (0) 0 ) x ′ . (82
)
CIN block. The first HOMP block we consider is the CIN block from Bodnar et al. (2021a), whose tensor diagram is illustrated in Figure 12. A CIN block updates the cochains h
(t) 0 , h(t)
1 and h (t)
2 via the following update rules. For x ∈ X 0 ,
h (t+1) x = MLP (t) 0,1   (1 + ϵ 0 )h (t) x + x ′ ∈A0,1(x) MLP (t) 0,2 h (t) x ′ h (t) E(x,x ′ )   ,(83)
for x ∈ X 1 , h (t+1) x = Cat   (1 + ϵ 1 1 )h (t) x + h (t) B ⊤ 1,0 (x) (1 + ϵ 2 1 )h (t) x + x ′ ∈A1,2(x) MLP (t) 0,2 h (t) x ′ h (t) E(x,x ′ )   (84) and for x ∈ X 2 , h(t+1)
x = MLP (t) 2 (1 + ϵ 2 )h (t) x + h (t)1 B ⊤ 2,1 (x) ,(85)
where Custom HOMP block. In addition to CIN blocks, we also use "custom HOMP" blocks that are designed to minimize the influence of 1-cells in the block update, thereby reducing dependence on the edge structure of graphs. Empirical results, as shown in Table 2, demonstrate that this approach improves performance on certain topological property prediction tasks. The exact update for this block is defined via the following rule. For x ∈ X 0 , h
if x, x ′ ∈ X r , E(x, x ′ ) = {y ∈ X r+1 | x, x ′ ⊆ y} and ϵ 0 , ϵ 1 1 , ϵ 2 1 , ϵ 2 are non-learnable hyperparameters. C 0 C 1 C 2 A0,1 B0,2 B ⊤2
x = Cat (1 + ϵ 1 0 )h (t) x + h (t) B0,2(x) (1 + ϵ 2 0 )h (t) x + h (t) A0,1(x) (86) for x ∈ X 1 , h (t+1) x = h (t) x , (87) and for x ∈ X 2 h (t+1) x = MLP (t) (1 + ϵ 2 )h (t) x + h (t) B ⊤ 2,0 (x) .(t+1)

Section: 88) .
Multi-cellular cochain initialization. SCL layers take as input multi-cellular cochains of the form h (t) 0,1 ∈ C 0,1 and h (t) 0,2 ∈ C 0,2 . These multi-cellular cochains are initialized with (h
(t) 0,r ) x1,x2 = MLP (t) r,1 ((h (t) 0 ) x1 ) + MLP (t) r,2 ((h (t) r ) x2 ) + MLP (t) r,3 (mark(x 1 , x 2 )),(89)
where mark(x 1 , x 2 ) is a marking strategy. Similarly to Zhang et al. (2023b) and Bar-Shalom et al. (2024) we consider two marking strategies: (1) binary marking:
mark B (x 1 , x 2 ) = 1 x 1 ∈ B 0,r (x 2 ) 0 otherwise.(90)
(2) distance-based marking: mark
D (x 1 , x 2 ) = min x∈B ⊤ r,0 (x2) d A0,1 (x 1 , x),(91)
where d A0,1 is the shortest path distance on H A0,1 .
SCL updates. We use two types of SCL updates, 1-SCL and 2-SCL, defined by
h (t+1) x1,x2 = Cat 1   (1 + ϵ 1 1 )h (t+1) x1,x2 + x ′ ∈A0,1(x1) MLP (t) 1 h (t) x ′ ,x2 h (t) E(x1,x ′ ) (1 + ϵ 2 1 )h (t) x1,x2 + h (t) x1,B0,1(x1)   (92) for x 1 ∈ X 0 , x 2 ∈ X 1 , and h (t+1) x1,x2 = Cat 2   (1 + ϵ 1 2 )h (t+1) x1,x2 + x ′ ∈A0,1(x1) MLP (t) 2 h (t) x ′ ,x2 h (t) E(x1,x ′ ) (1 + ϵ 2 2 )h (t) x1,x2 + h (t) x1,B0,2(x1)   (93) for x 1 ∈ X 0 , x 2 ∈ X 2 , respectivley.
SCL pooling. To use a HOMP block after an SCL block we need to pool the information from multi-cellular cochains to standard cochains. This is done via an SCL pooling block, defied by
(h 0 ) (t) x = MLP (t) 0 (h (t) 0,r ) x,Xr = MLP (t) 0 x ′ ∈Xr (h (t) 0,r ) x,x ′ (h r ) (t) x = MLP (t) r (h (t) 0,r ) X0,x = MLP (t) r x ′ ∈X0 (h (t) 0,r ) x ′ ,x(94)
Readout. All tasks considered in this paper require predicting a single value per input CC. Therefore, we employ a final readout layer of the form:
h out = MLP agg 0 { {h (T ) x | x ∈ X 0 } } + agg 1 { {h (T ) x | x ∈ X 1 } } + agg 2 { {h (T ) x | x ∈ X 2 } } + agg 3 { {h (T ) x1,x2 | x 1 ∈ X 0 , x 2 ∈ X 1 } } + agg 4 { {h (T ) x1,x2 | x 1 ∈ X 0 , x 2 ∈ X 2 } } , (95
)
where T is the final layer and agg i are either mean aggregation, sum aggregation, or the zero function.
Tensor diagrams. We use two types of tensor diagrams: sequential and parallel, both illustrated in Figure 14. In sequential tensor diagram, a stack of HOMP blocks is followed by a stack of SCL updates and then another stack of HOMP blocks. The parallel tensor diagram uses concurrent stacks of HOMP and SCL layers. Both the types of blocks/layers and the number of blocks/layers within each stack are treated as hyperparameters in the model.

Section: G.2 SYNTHETIC BENCHMARKS
Torus dataset. To construct the torus dataset we first select three parameters: m which specifies the number of nodes in the smallest CC in the dataset, M which specifies the number of nodes in the largest CC, and n, which specifies the maximum number of connected components in any CC within the dataset. The dataset is then constructed by iterating over all possible choices for the number of nodes and connected components, generating all possible disjoint unions of 2-dimensional tori with the specified parameters. We then select all the pairs that have the same size (number of nodes). As mentioned in the main text, each such pair is indistinguishable by HOMP despite differing in basic metric/topological properties: they either have distinct homology, or they differ in the diameters of some of the components. In our experiments, we use m = 18 (the smallest size that admits indistinguishable pairs), M = 40, and n = 3, resulting in 223 pairs.
To evaluate the ability of both HOMP and SMCN to distinguish between each pair, we follow the training and evaluation protocol presented in Wang & Zhang (2024). For each pair, we generate 64 copies where the order of cells is randomly permuted. The model is then trained to minimize the cosine similarity between the outputs corresponding to the two CCs in each pair. We measure the number of pairs where the output difference is statistically significant. Our results show that, while HOMP fails to distinguish any of the pairs, SMCN successfully differentiates all of them.
We use an SMCN model implementing a sequential tensor diagram composed of two CIN blocks, followed by four 1-SCL updates, and concluding with two additional CIN blocks. In comparison, the HOMP model consists of a stack of four consecutive CIN blocks, designed to have a comparable number of learnable parameters. The readout layer of both models uses a zero function as the aggregation for all multi-cellular cochains and a sum aggregation for all standard cochains. All models are trained for 20 epochs on each individual pair using a constant learning rate of 0.001. The embdding dimension used by all models is 128.
Lifted ZINC cross-diameter. We construct a CC dataset by adding cycles of length ≤ 18 as 2cells to graphs taken from the ZINC-12K dataset (Sterling & Irwin, 2015). We remove edge and node features, and predict the (0, 1, 2) cross diameter, computed by:
max x∈X0, y∈X2 min x ′ ∈y d A0,1 (x, x ′ )(96)
where d A0,1 (x, x ′ ) is the shortest path distance w.r.t the original graph. Training targets are normalized to have mean 0 and standard deviation 1. The model is trained using an MSE loss. At test time, we evaluate both the MSE of the normalized target as well as the accuracy in predicting the cross-diameter value, which has 18 possible outcomes. We compare three architectures: the first two are HOMP models which employ a stack of four CIN blocks, and a stack of four custom HOMP blocks respectively. The third is an SMCN model, which implements a sequential tensor diagram constructed by a single stack of six 2-SCL blocks, followed by a non-learnable pooling procedure as described in Equation 94. All models are constrained to a budget of 500K learnable parameters.
The readout layer of all three models is defined according to Equation 95where agg 1 , agg 3 , agg 4 are taken to be the zero function and agg 0 , agg 2 are taken to be the mean function. Models are trained for 200 epochs using a constant learning rate of 0.0001. The hidden dimension used by all models is 64.
Lifted ZINC second Betti number. For the second topological property prediction task we tested our model's ability to learn to predict the second order Betti numbers-the ranks of the second homology group. To this end we constructed our benchmark dataset the following way: We started with the ZINC-FULL datasets (containing 250k molecular graphs), lifting all graphs to CCs as in the cross-diameter task. We then computed the second Betti number for each of the lifted graphs and randomly selected 850 samples from each of the 6 most common values (which were 0, 1, 2, 3, 4 and 6), resulting in a balanced dataset of size 5,100. We used a 60%, 20%, 20% random split for training, validation, and test sets. As before, we remove all node and edge features, and normalized training targets to have mean 0 and standard deviation 1. The models are then trained using an MSE loss. At test time we evaluated both the MSE of the normalized target as well as the accuracy of predicting the seconnd Betti number. We use the same 3 models reported in the last experiment with the same exact hyperparameters.
The results of both lifted ZINC experiments are presented in Tables 2. Following the experimental setting of (Rieck, 2023) in which TDL models are tasked with learning metric properties of graphs taken from the MOLHIV dataset (Hu et al., 2020), we report the model's accuracy in predicting the correct target value. We additionally provide the normalized MSE score of the model.
SMCN significantly outperforms both HOMP methods in learning both the cross-diameter and the second Betti numbers, achieving higher accuracy as well as significantly lower standard deviation indicating a more stable learning process. This is particularly evident in the custom HOMP model, which, although it surpasses the CIN model in performance, suffers from a considerably larger standard deviation. Additionally, the SMCN model achieves strong results after a significantly lower number of epochs compared to both HOMP models, as illustrated in Figure 15.
Moreover, the results of the three synthetic experiments further demonstrate SMCN's superior capability in capturing the topological properties of CCs compared to existing HOMP architectures.
While our theoretical analysis established that SMCN can express topological information beyond the capabilities of any HOMP architecture, the synthetic experiments validate that SMCN models can effectively learn and leverage these properties in practice.
G.3 REAL-WORLD GRAPH BENCHMARKS ZINC Dataset (Sterling & Irwin, 2015;Dwivedi et al., 2023). The ZINC-12K dataset comprises 12,000 molecular graphs, extracted from the ZINC database, which is a collection of commercially  available chemical compounds. These molecular graphs vary in size, ranging from 9 to 37 nodes each. In these graphs, nodes correspond to heavy atoms, encompassing 28 distinct atom types. Edges in the graphs represent chemical bonds, with three possible bond types. We perform regression on the constrained solubility (logP) of the molecules. The dataset is pre-partitioned into training, validation, and test sets, containing 10,000, 1,000, and 1,000 molecular graphs, respectively.
For this experiment, we use an SMCN model implementing a sequential tensor diagram. The architecture consists of a single CIN block, followed by a stack of five 1-SCL layers, and concludes with an additional CIN block. Each CIN block has an embedding dimension of 85, while each SCL layer uses an embedding dimension of 70, resulting in a model with fewer than 500k parameters, as outlined in Dwivedi et al. (2023). The readout layer is defined according to Equation 95, where agg 2 , agg 3 , and agg 4 are zero functions, and agg 0 and agg 1 are a sum aggregation. Since we observed that this model converges slowly, we trained it for 2000 epochs, following the approach of Ma et al. (2023). The learning rate is initialized at 0.001 and decays by a factor of 0.5 every 300 epochs.
OGB Datasets (Hu et al., 2020). MOLHIV and MOLESOL are molecular property prediction datasets, adapted by the Open Graph Benchmark (OGB) from MoleculeNet. These datasets employ a unified featurization for nodes (atoms) and edges (bonds), encapsulating various chemophysical properties. The task in MOLHIV is to predict the capacity of compounds to inhibit HIV replication.
The task in MOLESOL is regression on water solubility (log solubility in mols per liter) for common organic small molecules.
For both datasets, we use an SMCN model that utilizes a parallel tensor diagram. For MOLHIV the architecture consists of two CIN blocks, and tow parallel 2-SCL blocks. CIN blocks have an embedding dimension of 64 and dropout in between layers with probability of 0.2 while SCL layers have an embedding dimension of 24 and dropout in between layers with probability of 0.5. The final SCL layer is followed by a non learnable pooling operation as per Equation 94. The readout layer is defined according to Equation 95, where agg 3 , and agg 4 are zero functions, and agg 0 , agg 1 and agg 2 are a mean aggregation. The mdoel is trained for 100 epochs with a constant learning rate of 0.0001.  (Xu et al., 2018) 89.4 ± 5.6% CIN (Bodnar et al., 2021a) 92.7 ± 6.1% PPGN (Maron et al., 2019) 90.6 ± 8.7% DS-GNN (Bevilacqua et al., 2021) 91.0 ± 4.8% DSS-GNN (Bevilacqua et al., 2021) 91.1 ± 7.0% SUN (Frasca et al., 2022) 92.7 ± 5.8% SMCN (ours) 92.5 ± 6.2%  94. The readout layer is defined according to Equation 95, where agg 2 , agg 3 , and agg 4 are zero functions, and agg 0 and agg 1 are a mean aggregation. The model is trained for 200 epochs with a constant learning rate of 0.0001.
MUTAG. The MUTAG dataset, part of the TUDataset benchmarks (Morris et al., 2020), comprises 188 molecular graphs. The task is to identify mutagenic molecular compounds, which are relevant for the development of potentially marketable drugs (Kazius et al., 2005;Riesen & Bunke, 2008).
Our training setup and evaluation procedure adhere to those outlined in Xu et al. (2018). For this experiment, the SMCN model utilizes a sequential tensor diagram constructed with a single stack of six 2-SCL blocks, followed by a non-learnable pooling procedure, as detailed in Equation 94.
Results are presented in Table 3.

Section: G.4 RUNTIME EVALUATIONS
To empirically measure the runtime differences between SMCN, subgraph GNNs and HOMP, we ran several wall clock measurements for data set construction, single epoch training and test set evaluation. We evaluate the SMCN variants used for the ZINC and MOLHIV experiments, GNN-SSWL+ which is the backbone subgraph GNN for SMCN, and CIN, a standard HOMP model. Runtime was evaluated across 10 runs, all experiments ran on a single NVIDIA A100 48GB GPU. Dataset construction times (in seconds) are:
• ZINC: 322.21 ± 8.764,
• MOLHIV: 678.01 ± 11.38.
We used the same lifting procedures and dataset construction for both SMCN and CIN so construction times are identical. Results for the train/test times appear in Tables 4 and5 respectively. SMCN incurs a computational overhead of approximately 23% on the MOLHIV benchmark and 38% on ZINC benchmark compared to CIN (trade-off for its improved predictive performance). Additionally SMCN consistently surpasses subgraph networks in runtime, achieving a 2.9x speedup on MOLHIV 

Section: ACKNOWLEDGMENTS
The authors would like to thank Beatrice Bevilacqua for insightful discussions. YG is supported by the the UKRI Engineering and Physical Sciences Research Council (EPSRC) CDT in Autonomous and Intelligent Machines and Systems (grant reference EP/S024050/1). FF is funded by the Andrew and Erna Finci Viterbi Post-Doctoral Fellowship. FF partly performed this work while visiting the Machine Learning Research Unit at TU Wien led by Prof. Thomas Gärtner. MB is supported by EPSRC Turing AI World-Leading Research Fellowship No. EP/X040062/1 and EPSRC AI Hub on Mathematical Foundations of Intelligence: An "Erlangen Programme" for AI No. EP/Y028872/1. HM is a he Robert J. Shillman Fellow, and is supported by the Israel Science Foundation through a personal grant (ISF 264/23) and an equipment grant (ISF 532/23).

Section: 
Published as a conference paper at ICLR 2025 In the context of shape detection, a natural choice for g is the average shortest-path distance g(v) = 1 |V| u∈V d(u, v), as it only depends on the graph structure and is thus invariant to geometric transformations of the features. As for the covering U, a natural choice is {(η • i, η • i + ϵ)} i∈N for hyper-parameters η, ϵ ∈ R. We now formally state Proposition 4.4 for MOG pooling.
Proposition B.17. There exist pairs of graphs G and G ′ such that the combinatorial complexes X = MOG(G) and X ′ = MOG(G ′ ) are indistinguishable by HOMP. This occurs despite the fact that the (0, 1, 2) cross diameters of the two CCs are different. Proof. We begin with a base example. Let G and G ′ be the two graphs depicted at the bottom of Figure 11 before applying the MOG lifting procedure, and denote their node sets by S = {s 1 , . . . , s 6 } and S ′ = {s ′ 1 , . . . , s ′ 6 } , with the order as shown in the Figure . Note that the sets P 1 = {s 1 , s 2 , s 5 , s 6 } P 2 = {s 3 , s 4 } form a partition of S, and nodes within the same partition set are automorphic (i.e., there exists a graph automorphism mapping one node to the other). The same holds for
Thus, the function g, defined as the average shortest path distance (SPD) of each node, is constant on each of these sets. This implies that by choosing a sufficiently fine covering (i.e. selecting a small enough η), the 2-cells defined by the MOG algorithm for graphs G and G ′ will be
} respectively (We split sets P 1 and P ′ 1 to their connected components). Defining X = MOG(G) and X ′ = MOG(G ′ ), we now aim to show that HOMP cannot distinguish between these two CCs. Figure 11 depicts X , X ′ and an additional combinatorial complex X which covers both of these complexes. Since X and X ′ are connected and of the same size, Theorem B.1 shows that these complexes are indistinguishable by HOMP. Another quick calculation shows that diam 2 A0,1 (X ) = 3 while diam 2 A0,1 (X ′ ) = 2 concluding the base example.
To demonstrate that this phenomenon occurs in larger graphs where the MOG procedure produces a small number of 2-cells, we expand upon the aforementioned example. For each integer n ≥ 3 , let Cyc(n) denote the cyclic graph of length n with the node set
, where × denotes the Cartesian graph product. We now demonstrate that the proof above remains valid for G n and G ′ n for any n ≥ 3 . First, defining P 1,n = P 1 × V n , P 2,n = P 2 × V n where here × denotes set cartesian product, we notice that since all nodes in P i are G isomorphic, and all nodes in V n Note that this is the exact message used in HOMP tensor diagrams. If the label is "equiv", the message is computed as described in Equation 50. By slight abuse of notation, we often denote the collection of multi-cellular cochains associated with the nodes in layer t by h (t) . For all other labels, the message follows the standard tensor diagram update process. The last layer of the tensor diagram contains a single node representing a final readout layer (See Appendix G.1 for implementation details of both the update computation and the possible readout layers used in this paper).

Section: D SCALABLE MULTI-CELLULAR NETWORKS
In this section, we provide an in-depth description of the SMCN model presented in Section 6. To construct SMCN, We augment MCN diagrams, reducing the number of possible node labels and adding a new "SCL" update. Figure 5 depicts a SMCN tensor diagram. We now describe the components of the MCN scheme. For the rest of this section, we borrow the notation scheme of Appendix C
Diagram. Similar to HOMP and SMCN tensor diagrams, like MCN, are layered directed graphs with labeled nodes and edges. Each node is labeled by a multi-cellular cochain space C k , but unlike MCN we restrict k such that k r ≤ 2, Thus reducing memory complexity. Like before, directed edges with source and target nodes labeled by C er can be labeled by any neighborhood function, while edges between nodes labeled by other types of multi-cellular cochain spaces can be labeled with the new label "equiv". Additionally, directed edges with source and target nodes labeled by C er+e r ′ can be labeled by 'SCL".
Input. The input to the SMCN model is identical to the input to the MCN model. As we will now show, the matrix form of neighborhood functions B r1,r2 play a similar role to that of node marking policies in subgraph networks Bevilacqua et al. (2021); Frasca et al. (2022); Zhang et al. (2023b). Subgraph networks process feature maps of the form h : V × V → R d where V is a set of nodes. Node marking policies employ an initial feature map of:
Similarly, SCL updates process cochains of the form h : X r1 × X r2 → R d , and the matrix form of B r1,r2 satisfies:
Update. The SMCN update is computed in the same way as the MCN update is computed (see Appendix C), where the message of a directed edge (u, v) labeled by "SCL" is computed by
(co)Ar 1 ,r (x),y , h
x,(co)A r 2 ,r ′ (y) , h
where if Q 1 ⊆ X r1 and Q 2 ⊆ X r2 are sets of cells, h Q1,y :=
x ′ ∈Q1 h x ′ ,y and h x,Q2 := y ′ ∈Q2 h x,y ′ . The SCL update can be considered an aggregation of subgraph updates on the augmented Hasse graphs induced by the input CC. For each choice of r, r ′ the compotation inside the aggregation function in Equation 55 is identical to a CS-GNN (Bar-Shalom et al., 2024) update on the augmented Hasse graph (see Definition 6.1) H (co)Ar1,r where the set of "super-nodes" 9 is X r2 and the connectivity of the super nodes is given according to the Hasse graph H (co)Ar2,r ′ . More specifially, for r 1 = r 2 = 0, r, r ′ = 1 we recreate the GNN-SSWL+ (Zhang et al., 2023b) update.
Computational complexity. The computational complexity of SMCN depends on the choice of multi-cellular cochains spaces in the tensor diagram. For C er 1 +er 2 → C er 1 +er 2 updates the worstcase computational complexity for the most general model is O ℓ 2 • d • n r1 • n r2 , where d is the maximal degree w.r.t any neighborhood function. In our experiments, we use tensor diagrams containing a single type of multi-cellular cochains spaces resulting in models with a runtime complexity Published as a conference paper at ICLR 2025 and a 1.3x speedup on ZINC. This improvement over subgraph networks stems from the fact SMCN uses fewer subgraphs updates and leverages higher order topological information instead.


References:
[b0] Ralph Abboud; Martin Ismail Ilkan Ceylan; Thomas Grohe;  Lukasiewicz (2020). The surprising power of graph neural networks with random node initialization. 
[b1] Waiss Azizian; Marc Lelarge (2020). Expressive power of invariant and equivariant graph neural networks. 
[b2] Rubén Ballester; Pablo Hern'andez-Garc'ia; Mathilde Papillon; Claudio Battiloro; Nina Miolane; Tolga Birdal; Carles Casacuberta; Sergio Escalera; Mustafa Hajij (2024). Attending to topological spaces: The cellular transformer. 
[b3] Jacob Bamberger (2022). A topological characterisation of weisfeiler-leman equivalence classes. PMLR
[b4] Guy Bar-Shalom; Beatrice Bevilacqua; Haggai Maron (2023). Subgraphormer: Subgraph GNNs meet graph transformers. 
[b5] Guy Bar-Shalom; Yam Eitan; Fabrizio Frasca; Haggai Maron (2024). A flexible, equivariant framework for subgraph gnns via graph products and graph coarsening. 
[b6] Beatrice Bevilacqua; Fabrizio Frasca; Derek Lim; Balasubramaniam Srinivasan; Chen Cai; Gopinath Balamurugan; Haggai Michael M Bronstein;  Maron (2021). Equivariant subgraph aggregation networks. 
[b7] Beatrice Bevilacqua; Moshe Eliasof; Eli Meirom; Bruno Ribeiro; Haggai Maron (2023). Efficient subgraph gnns by learning effective selection policies. 
[b8] Lukas Biewald (2020). Experiment tracking with weights and biases. 
[b9] Cristian Bodnar; Fabrizio Frasca; Nina Otter; Guang Yu; Pietro Wang; Guido Liò; Michael M Montufar;  Bronstein (2021). Weisfeiler and lehman go cellular: CW networks. 
[b10] Cristian Bodnar; Fabrizio Frasca; Yuguang Wang; Nina Otter; Pietro Guido F Montufar; Michael Lió;  Bronstein (2021-07-24). Weisfeiler and lehman go topological: Message passing simplicial networks. PMLR
[b11] Giorgos Bouritsas; Fabrizio Frasca; Stefanos Zafeiriou; Michael M Bronstein (2022). Improving graph neural network expressivity via subgraph isomorphism counting. IEEE Transactions on Pattern Analysis and Machine Intelligence
[b12] Michael Bronstein (2021-12). Using subgraphs for more expressive gnns. 
[b13] Joan Michael M Bronstein; Taco Bruna; Petar Cohen;  Veličković (2021). Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. 
[b14] Davide Buffelli; Farzin Soleymani; Bastian Rieck (2024). Cliqueph: Higher-order information for graph neural networks through persistent homology on clique graphs. 
[b15] Yuzhou Chen; Baris Coskunuzer; Yulia Gel (2021). Topological relational learning on graphs. Advances in neural information processing systems
[b16] Leonardo Cotta; Christopher Morris; Bruno Ribeiro (2021). Reconstruction for powerful graph representations. Advances in Neural Information Processing Systems
[b17] Facundo Tamal K Dey; Yusu Mémoli;  Wang (2016). Multiscale mapper: Topological summarization via codomain covers. SIAM
[b18] Vijay Prakash Dwivedi; K Chaitanya; Anh Tuan Joshi; Thomas Luu; Yoshua Laurent; Xavier Bengio;  Bresson (2023). Benchmarking graph neural networks. Journal of Machine Learning Research
[b19] Matthias Fey; Jan Eric Lenssen (2019). Fast graph representation learning with pytorch geometric. 
[b20] Fabrizio Frasca; Beatrice Bevilacqua; Michael Bronstein; Haggai Maron (2022). Understanding and extending subgraph gnns by rethinking their symmetries. Advances in Neural Information Processing Systems
[b21] Justin Gilmer; S Samuel; Patrick F Schoenholz; Oriol Riley; George E Vinyals;  Dahl (2017). Neural message passing for quantum chemistry. PMLR
[b22] Lorenzo Giusti; Teodora Reu; Francesco Ceccarelli; Cristian Bodnar; Pietro Liò (2023). Cin++: Enhancing topological message passing. 
[b23] Aric Hagberg; Pieter Swart; Daniel S Chult (2008). Exploring network structure, dynamics, and function using networkx. 
[b24] Mustafa Hajij; Bei Wang; Paul Rosen (2018). Mog: Mapper on graphs for relationship preserving clustering. 
[b25] Mustafa Hajij; Kyle Istvan; Ghada Zamzmi (2020). Cell complex neural networks. 
[b26] Mustafa Hajij; Ghada Zamzmi; Theodore Papamarkou; Nina Miolane; Aldo Guzmán-Sáenz; Karthikeyan Natesan; Ramamurthy  (2022). Higher-order attention networks. 
[b27] Mustafa Hajij; Ghada Zamzmi; Theodore Papamarkou; Nina Miolane; Aldo Guzmán-Sáenz; Karthikeyan Natesan Ramamurthy; Tolga Birdal; K Tamal; Soham Dey;  Mukherjee; N Shreyas;  Samaga (2022). Topological deep learning: Going beyond graph data. 
[b28] Mustafa Hajij; Mathilde Papillon; Florian Frantzen; Jens Agerberg; Ibrahem Aljabea; Ruben Ballester; Claudio Battiloro; Guillermo Bernárdez; Tolga Birdal; Aiden Brent (2024). Topox: a suite of python packages for machine learning on topological domains. 
[b29] Jason Hartford; Devon Graham; Kevin Leyton-Brown; Siamak Ravanbakhsh (2018). Deep models of interactions across sets. PMLR
[b30] Allen Hatcher (2002). Algebraic topology. Cambridge University Press
[b31] Max Horn; Edward De Brouwer; Michael Moor; Yves Moreau; Bastian Rieck; Karsten Borgwardt (2021). Topological graph neural networks. 
[b32] Weihua Hu; Matthias Fey; Marinka Zitnik; Yuxiao Dong; Hongyu Ren; Bowen Liu; Michele Catasta; Jure Leskovec (2020). Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems
[b33] Stefanie Jegelka (2022). Theory of graph neural networks: Representation and learning. 
[b34] Jeroen Kazius; Ross Mcguire; Roberta Bursi (2005). Derivation and validation of toxicophores for mutagenicity prediction. Journal of medicinal chemistry
[b35] Nicolas Keriven; Gabriel Peyré (2019). Universal invariant and equivariant graph neural networks. Advances in Neural Information Processing Systems
[b36] Thomas Kipf; Max Welling (2016). Semi-supervised classification with graph convolutional networks. 
[b37] Derek Lim; Joshua Robinson; Lingxiao Zhao; Tess Smidt; Suvrit Sra; Haggai Maron; Stefanie Jegelka (2022). Sign and basis invariant networks for spectral graph representation learning. 
[b38] Liheng Ma; Chen Lin; Derek Lim; Adriana Romero-Soriano; Puneet K Dokania; Mark Coates; Philip Torr; Ser-Nam Lim (2023). Graph inductive biases in transformers without message passing. PMLR
[b39] Yao Ma; Suhang Wang; Charu C Aggarwal; Jiliang Tang (2019). Graph convolutional networks with eigenpooling. 
[b40] Heli Haggai Maron; Nadav Ben-Hamu; Yaron Shamir;  Lipman (2018). Invariant and equivariant graph networks. 
[b41] Heli Haggai Maron; Hadar Ben-Hamu; Yaron Serviansky;  Lipman (2019). Provably powerful graph networks. Advances in neural information processing systems
[b42] Christopher Morris; Martin Ritzert; Matthias Fey; Jan William L Hamilton; Gaurav Eric Lenssen; Martin Rattan;  Grohe (2019). Weisfeiler and leman go neural: Higher-order graph neural networks. 
[b43] Christopher Morris; Nils M Kriege; Franka Bause; Kristian Kersting; Petra Mutzel; Marion Neumann (2020). Tudataset: A collection of benchmark datasets for learning with graphs. 
[b44] Christopher Morris; Yaron Lipman; Haggai Maron; Bastian Rieck; Nils M Kriege; Martin Grohe; Matthias Fey; Karsten Borgwardt (2023). Weisfeiler and leman go machine learning: The story so far. The Journal of Machine Learning Research
[b45] Theodore Papamarkou; Tolga Birdal;  Michael M Bronstein; Justin Gunnar E Carlsson; Yue Curry; Mustafa Gao; Roland Hajij; Pietro Kwitt; Paolo Di Lio;  Lorenzo (2024). Position: Topological deep learning is the new frontier for relational learning. 
[b46] Adam Paszke; Sam Gross; Francisco Massa; Adam Lerer; James Bradbury; Gregory Chanan; Trevor Killeen; Zeming Lin; Natalia Gimelshein; Luca Antiga (2019). Pytorch: An imperative style, highperformance deep learning library. Advances in neural information processing systems
[b47] Omri Puny; Derek Lim; Bobak Kiani; Haggai Maron; Yaron Lipman (). Equivariant polynomials for graph neural networks. 
[b48]  Pmlr (2023). . 
[b49] Bastian Rieck (2023). On the expressivity of persistent homology in graph learning. 
[b50] Kaspar Riesen; Horst Bunke (2008). Iam graph database repository for graph based pattern recognition and machine learning. Springer
[b51] Gurjeet Singh; Facundo Mémoli;  Gunnar E Carlsson (2007). Topological methods for the analysis of high dimensional data sets and 3d object recognition. PBG@ Eurographics
[b52] Teague Sterling; John J Irwin (2015). Zinc 15-ligand discovery for everyone. Journal of chemical information and modeling
[b53] Yogesh Verma; Amauri H Souza; Vikas Garg (2024). Topological neural networks go persistent, equivariant, and continuous. 
[b54] Yanbo Wang; Muhan Zhang (2024). An empirical study of realized gnn expressiveness. 
[b55] Boris Weisfeiler; Andrei Leman (1968). The reduction of a graph to canonical form and the algebra which appears therein. nti, Series
[b56] Keyulu Xu; Weihua Hu; Jure Leskovec; Stefanie Jegelka (2018). How powerful are graph neural networks?. 
[b57] Zuoyu Yan; Tengfei Ma; Liangcai Gao; Zhi Tang; Chao Chen; Yusu Wang (2024). Cycle invariant positional encoding for graph representation learning. PMLR
[b58] Rex Ying; Jiaxuan You; Christopher Morris; Xiang Ren; William L Hamilton; Jure Leskovec (2018). Hierarchical graph representation learning with differentiable pooling. 
[b59] Jiaxuan You; Jonathan M Gomes-Selman; Rex Ying; Jure Leskovec (2021). Identity-aware graph neural networks. 
[b60] Chulhee Yun; Suvrit Sra; Ali Jadbabaie (2019). Small relu networks are powerful memorizers: a tight analysis of memorization capacity. Advances in Neural Information Processing Systems
[b61] Manzil Zaheer; Satwik Kottur; Siamak Ravanbakhsh; Barnabas Poczos; Russ R Salakhutdinov; Alexander J Smola (2017). Deep sets. Advances in neural information processing systems
[b62] Bingxu Zhang; Changjun Fan; Shixuan Liu; Kuihua Huang; Xiang Zhao; Jincai Huang; Zhong Liu (2023). The expressive power of graph neural networks: A survey. 
[b63] Bohang Zhang; Guhao Feng; Yiheng Du; Di He; Liwei Wang (2023). A complete expressiveness hierarchy for subgraph gnns via subgraph weisfeiler-lehman tests. PMLR
[b64] Bohang Zhang; Shengjie Luo; Liwei Wang; Di He (2023). Rethinking the expressive power of gnns via graph biconnectivity. 
[b65] Bohang Zhang; Lingxiao Zhao; Haggai Maron (2024). On the expressive power of spectral invariant graph neural networks. 
[b66] Muhan Zhang; Pan Li (2021). Nested graph neural networks. Advances in Neural Information Processing Systems

Figures:
Figure fig_0: 1
Type: figure
Caption: Figure 1 :1Figure 1: Pairs of HOMP-indistinguishable complexes differing in fundamental metric/topological properties that. In Figure 1(a), tori with different diameters (top 20, bottom 22); in Figure 1(b), a Möbius strip and a cylinder differing in both orientability and planarity; in Figure 1(c), a torus and a pair of disconnected tori which have different homology groups.
Data: 

Figure fig_1: 
Type: figure
Caption: Figure 2: HOMP Tensor diagram.
Data: 

Figure fig_2: 3
Type: figure
Caption: Figure 3 :3Figure 3: Cyl h,2p covers both Cyl h,p and Möb h,p .
Data: 

Figure fig_3: 4
Type: figure
Caption: Figure 4 :4Figure 4: Cylinders are planar.
Data: 

Figure fig_4: 8
Type: figure
Caption: Figure 8 :8Figure 8: Boundary 1-cells.
Data: 

Figure fig_5: 
Type: figure
Caption: e. T p1,...,p ℓ and T p ′ 1 ,...,p ′ ℓ have the same number of 0-cells) and ∀j ∈ {1, . . . , ℓ}, p j , p ′ j ≥ 3, then for every HOMP model M, M(T p1,...,p ℓ ) = M(T p ′ 1 ,...,p ′ ℓ ).Proof. Both T p1,...,p ℓ and T p ′ 1 ,...,p ′ ℓ are connected, have the same number of 0-cells ((T p1,...,p ℓ
Data: 

Figure fig_6: 
Type: figure
Caption: and are covered by T p1•p ′ i ,...,p ℓ •p ′ ℓ . Therefore, Theorem B.1 implies that T p1,...,p ℓ and T p ′ 1 ,...,p ′ ℓ are indistinguishable by HOMP.
Data: 

Figure fig_7: 
Type: figure
Caption: Proof.Figure 10: A pair of indistinguishable CCs produced by triangular lifting. The left-hand CC covers each connected component of the right-hand CC.
Data: 

Figure fig_8: 
Type: figure
Caption: Pooling.For the pooling example we focus on the Mapper algorithm Singh et al. (2007); Hajij et al. (2018); Dey et al. (
Data: 

Figure fig_9: 
Type: figure
Caption: )It was shown inHajij et al. (2022b)  that a pair of CCs is isomorphic if and only if their corresponding Hasse graphs are isomorphic. Therefore, in our case, H and H ′ are non-isomorphic graphs. Since any pair of non-isomorphic graphs of size n are n-WL distinguishable, and k-IGN networks can distinguish between any pair of k-WL indistinguishable graphs (seeMaron et al. (2019)), it is enough to prove that there exists a MCN model which is able to simulate any k-IGN network on the Hasse graphs. Let A be the adjacency matrix of H and define n = |X |, n r = |X r | for all r ∈ {0, . . . , ℓ}.
Data: 

Figure fig_10: 
Type: figure
Caption: Figure 13: Custom HOMP.
Data: 

Figure fig_12: 14
Type: figure
Caption: Figure 14 :14Figure 14: Experiments are using two types of SMCN tensor diagrams. Sequential diagrams 14(a) stacks CIN blocks and SCL updates; Parallel diagrams 14(b) performs SCL updates and CIN blocks in parallel.
Data: 

Figure : 
Type: figure
Caption: 
Data: 

Figure : 
Type: figure
Caption: 
Data: 

Figure : 
Type: figure
Caption: 
Data: 

Figure tab_2: 1
Type: table
Caption: SMCN outperforms MPNNs , HOMP and expressive GNNs on graph regression and classification tasks. SMCN results are reported over 5 runs with seed 1-5.
Data: ModelReferenceZINCMOLHIVMOLESOLMAE (↓)ROC-AUC (↑)RMSE (↓)GCNKipf & Welling (2016)0.321 ± 0.00976.06 ± 0.971.114 ± 0.036GINXu et al. (2018)0.163 ± 0.00475.58 ± 1.401.173 ± 0.057CINBodnar et al. (2021a)0.079 ± 0.00680.94 ± 0.571.288 ± 0.026CIN++Giusti et al. (2023)0.077 ± 0.00480.63 ± 0.94-CIN + CycleNetYan et al. (2024)0.068--Cellular Transformer Ballester et al. (2024)0.08079.46-PPGNMaron et al. (2019)0.079 ± 0.005--PPGN++ (6)Puny et al. (2023)0.071 ± 0.001--DS-GNNBevilacqua et al. (2023)0.087 ± 0.00376.54 ± 1.370.847 ± 0.015DSS-GNNBevilacqua et al. (2021)0.102 ± 0.00376.78 ± 1.66-SUNFrasca et al. (2022)0.083 ± 0.00380.03 ± 0.55-GNN-SSWLZhang et al. (2023b)0.082 ± 0.003--GNN-SSWL+Zhang et al. (2023b)0.070 ± 0.00579.58 ± 0.350.837 ± 0.019SubgraphormerBar-Shalom et al. (2023) 0.067 ± 0.00780.38 ± 1.920.832 ± 0.043Subgraphormer + PE Bar-Shalom et al. (2023) 0.063 ± 0.00179.48 ± 1.280.826 ± 0.010SMCN (ours)This paper0.060 ± 0.004 81.16 ± 0.90 0.809 ± 0.037

Figure tab_3: 2
Type: table
Caption: Accuracy and normalized MSE scores of predicting the crossdiameter and the second Betti number of lifted ZINC graphs.
Data: ModelCross-diameter2nd Betti numberAccuracy (↑) / MSE (↓)

Figure tab_5: 
Type: table
Caption: The k-IGN architecture was proven inMaron et al. (2019) to be as expressive as the k-WL test, extending the capabilities of MPNNs, which were shown to possess expressivity equivalent to the 1-WL test. Despite this l-IGNs have a runtime complexity of O(n k ) making them inpractical to use. To address this, other expressive GNNs have been proposed, offering a balance between the computational complexity and expressive power of 3-IGNs and MPNNs. One such family of architectures is subgraph neural networks.Subgraph neural networks. Subgraph neural networks,(You et al., 2021; ?;Cotta et al., 2021;Bevilacqua et al., 2021), rely on a predefined policy that transforms an input graph into a set of graphs, with each graph in the set representing an augmented version of the original. Some policies include node deletion, where each graph in the set is created by removing a single node from the original graph; k-ego policies, where each graph is generated by extracting the k-neighborhood of a specific node; and node marking, where each graph is obtained by assigning a unique node feature to a single node in the original graph. Subgraph GNNs then process sets of graphs by independently applying MPNN updates to each graph in the set, while also incorporating cross-graph updates to exchange information between the graphs.Bevilacqua et al. (2021) has shown that subgraph GNNs are strictly more expressive then MPNNs, whileFrasca et al. (2022) has shown they are strictly less expressive then 3-IGNS. With a runtime complexity of O(d • n 2 ) where d is the maximum degree of the input graph, subgraph GNNs offer a compelling trade-off: they are more expressive than MPNNs while remaining more scalable than 3-IGNs. Subgraph GNNs have demonstrated strong empirical performance in studies such asBevilacqua et al. (2021), Frasca et al. (2022), and  Zhang et al. (2023b), among others, establishing them as a robust and effective choice for GNN architectures. For an in depth discussion on subgraph neural networks seeBronstein (2021).
Data: 

Figure tab_6: 
Type: table
Caption: .1 A TOPOLOGICAL CRITERION FOR HOMP INDISTINGUISHABILITY In this section we formally restate and prove Theorem 4.2.
Data: GFHECDAB

Figure tab_7: 
Type: table
Caption: Using h, we can adjust the proof of Proposition F.5 by summing in Equation 79 only over 1-cells for which h x = 0, resulting in the number of connected components of X with no boundary. This shows that SMCN can distinguish CCs based on either one of the aforementioned three properties, concluding the proof.F.2 LIFTING AND POOLINGIn this section, we rigorously state and prove Proposition 6.6 which appears in Section 6.1 for graph triangular lifting and for the MOG (graph Mapper) pooling algorithm. Corresponding results for the HOMP case can be found in Appendix B.3. We start with triangular lifting (Definition B.14). There exist pairs of graphs G and G ′ such that the combinatorial complexes X = 3-CL(G) and X ′ = 3-CL(G ′ ) are indistinguishable by HOMP, but can be distinguished by an SMCN model with asymptotic runtime O(m deg • n 0 • n 2 • T ) where n 0 is the number of nodes in the original graph, n 2 is the number of 2-rank cells constructed by triangular lifting ,m deg is the maximal degree and T is the number of layers.
Data: is in the same connected component as a boundary edge 0 otherwise.(81)Proposition F.8.

Figure tab_8: 3
Type: table
Caption: Accuracy on the MUTAG dataset.
Data: ModelAccuracy (↑)GIN

Figure tab_9: 4
Type: table
Caption: Train time (seconds per epoch). Time measurements from training each of the models. Measures are averaged across 10 epochs, starting from epoch number 20.For MOLESOL the architecture consists of two custom HOMP blocks, and two parallel 1-SCL layers. HOMP blocks have an embedding dimension of 16 while SCL layers have an embedding dimension of 228 where neither block uses dropout. The final SCL block is followed by a non learnable pooling operation as per Equation
Data: SMCNGNN-SSWL+CINZINC7.39 ± 0.179.65 ± 0.195.35 ± 0.33MOLHIV17.70 ± 0.4251.02 ± 0.2514.34 ± 0.27

Figure tab_10: 5
Type: table
Caption: Test time (seconds per full test set inference). Time measurements for inference over the entire test set. Results are averaged over 10 runs.
Data: SMCNGNN-SSWL+CINZINC0.93 ± 0.081.04 ± 0.030.71 ± 0.05MOLHIV2.09 ± 0.153.07 ± 0.032.02 ± 0.12


Formulas:
Formula formula_0: A r1,r2 (x) = {y ∈ X r1 | ∃z ∈ X r2 s.t. x, y ⊆ z}, coA r1,r2 (x) = {y ∈ X r1 | ∃z ∈ X r2 s.t. z ⊆ x, y},(1)

Formula formula_1: B r1,r2 (x) = {y ∈ X r2 | x ⊆ y}, B ⊤ r1,r2 (x) = {y ∈ X r2 | y ⊆ x},(2)

Formula formula_2: B r1,r2 (x) = B ⊤ r1,r2 (x) = ∅ for x / ∈ X r1 .

Formula formula_3: C 0 C 2 A 0,1 B 0,2 B ⊤ 2,0(

Formula formula_4: h (t+1) x = β   k i=1 y∈Ni(x) MLP (t) i,rk(x) h (t) x , h (t) y   ,(3)

Formula formula_5: (t)

Formula formula_6: C e0 C e2 C e0+e1 C 2e0 C 2e0+e1 C 3e0

Formula formula_7: h k : X k0 0 × • • • × X k ℓ ℓ → R d .

Formula formula_8: (A h ) i0,...,i ℓ ,: = h x 0 (i0)1 , . . . , x 0 (i0) k 0 , . . . , x ℓ (i ℓ )1 , . . . , x ℓ (i ℓ ) k ℓ (4)

Formula formula_9: R n k 0 0 ×•••×n k ℓ ℓ ×d . The group G = S n0 × • • • × S n ℓ acts on h ∈ C k by (σ • h)(x 0 , . . . , x ℓ ) = ((σ 0 , . . . , σ ℓ ) • h)(x 0 , . . . , x ℓ ) = h(σ 0 • x 0 , . . . , σ ℓ • x ℓ ),(5)

Formula formula_10: R n k 0 0 ×•••×n k ℓ ℓ ×d , we utilize the basis of equivariant linear layers R n k 0 0 ×•••×n k ℓ ℓ ×d → R n k ′ 0 0 ×•••×n k ′ ℓ ℓ ×d ′

Formula formula_11: C k → C k ′ . Using this basis, denoted {L γ } γ∈Γ(k,k ′ ,d,d ′ ) (Here Γ(k, k ′ , d, d ′ )

Formula formula_12: F (h) = β γ∈Γ(k,k ′ ,d,d ′ ) w γ L γ (A h ) ,(6)

Formula formula_13: (v) x = u∈pred(v) m u,v (x)

Formula formula_14: X H coA2,1 (X ) coA 2,1 Figure 6: H coA2,1 .

Formula formula_15: {X x } x∈X2 {H x A0,1 } x∈X2 ∈ C e0+e2 (X )

Formula formula_16: m u,v (x, y) = ℓ r=0,r ′ =0 MLP r,r ′ h (t) x,y , h(t)

Formula formula_18: )A r 2 ,r ′ (y) , h x,Br 1 ,r 2 (x) , h B ⊤ r 2 ,r 1 (y),y ,

Formula formula_19: O(m deg • n 0 • n 2 )

Formula formula_20: σ • T i,j,k = T σ -1 (i),σ -1 (j),k σ ∈ S n .(7)

Formula formula_21: σT i1,...,i k mj = T σ -1 (i1),...,σ -1 (i k ),j .(8)

Formula formula_22: R n k ×c → R n k ′ ×c ′ which satisfy L(σ • T) = σ • L(T).

Formula formula_23: U (T ) = β( γ∈Γ w γ L γ (T))(10)

Formula formula_24: R n k 1 ×c1 to R n k 2 ×c2 for some k 1 1, k 2 ≤ k and c 1 , c 2 ∈ N,

Formula formula_25: | = |X ′ 0 |. If X and X ′ admit decompositions into connected components X = Z∈C(X ) Z, X ′ = Z ′ ∈C(X ′ ) Z ′ ,(11)

Formula formula_26: (t)

Formula formula_27: x ′ = h (t) ρ(x ′ ) , for t = 0, . . . , T , x ′ ∈ X .

Formula formula_28: (0)

Formula formula_29: h (t+1) x = β   N ∈N nat y∈N (x) MLP (t) N ,rk(x) (h (t) x , h (t) y )   , h(t+1) x ′ = β   N ∈N nat y ′ ∈N (x ′ ) MLP (t) N ,rk(x ′ ) ( h(t) x ′ , h(t) y ′ )   . (12

Formula formula_30: )

Formula formula_31: y ′ ∈N (x ′ ) MLP (t) N ,rk(x ′ ) ( h(t) x ′ , h(t) y ′ ) = y∈N (ρ(x ′ )) MLP (t) N ,rk(ρ(x ′ )) (h (t) ρ(x ′ ) , h (t) y ).(13)

Formula formula_32: x ′ = h (t+1) ρ(x ′ ) . Lemma B.3. If X is connected and ρ : X → X is a covering map, ∀x ∈ X , |ρ -1 (x)| = | X0| |X0| .

Formula formula_33: x ′ 1 , x ′ 2 ∈ ρ -1 (x) we have N (x ′ 1 ) ∩ N (x ′ 2 ) ̸ = ∅. If z ′ ∈ N (x ′ 1 )∩N (x ′ 2 ), then there is a neighborhood function N * ∈ N nat such that x ′ 1 , x ′ 2 ∈ N * (z ′ ). Given that ρ(x ′ 1 ) = ρ(x ′

Formula formula_34: x ′ ∈ ρ -1 (x) there exists a y ′ ∈ N (x ′ ) such that ρ(y ′ ) = y. Since the set {N (x ′ ) | x ′ ∈ ρ -1 (x)} is pairwise disjoint this implies that |ρ -1 (y)| ≥ |ρ -1 (x)|. Since y ∈ N (x)

Formula formula_35: Z ∈ C(X ), Z ′ ∈ C(X ′ ) and every z ∈ Z, z ′ ∈ Z ′ we have h (T ) z = h(T ) y ∀y ∈ ρ -1 Z (z), h ′(T ) z ′ = h(T ) y ∀y ∈ ρ -1 Z ′ (z ′ ).(14)

Formula formula_36: (T ) x | x ∈ X } }, { {h ′(T ) x ′ | x ′ ∈ X ′ } } and { { h(T ) y

Formula formula_37: (T ) x | x ∈ X } }, { {h ′(T ) x ′ | x ′ ∈ X ′ } } and { { h(T ) y | y ∈ X } } respectively. Since each Z, Z ′ are connected, we can use Lemma B.3 to get that ∀z ∈ Z, ∀z ′ ∈ Z ′ , |ρ -1 Z (z)| = | X0| |Z0| and |ρ -1 Z ′ (z ′ )| = | X0| |Z ′ 0 | . This implies that ∀y ∈ X n y = ñy •   Z∈C(X ) |Z 0 | | X0 |   , n ′ y = ñy •   Z ′ ∈C(X ′ ) |Z ′ 0 | | X0 |   . (15) Since Z∈C(X ) |Z 0 | = |X 0 | = |X ′ 0 | = Z ′ ∈C(X ′ ) |Z ′ 0 |, this implies that ∀y ∈ X , n y = n ′ y .

Formula formula_38: (T ) x | x ∈ X } } and { {h ′(T ) x | x ′ ∈ X ′ } }

Formula formula_39: S = [p 1 ] × • • • × [p ℓ ],(16)

Formula formula_40: X r = {s k | s ∈ S, k ∈ {0, 1} ℓ , k 1 + • • • + k ℓ = r},(17)

Formula formula_41: s k = {s + k ′ | k ′ ∈ {0, 1} ℓ , k ′ ≤ k}.

Formula formula_42: k ′ ≤ k if k ′ j ≤ k j , ∀j ∈ {1, . . . , ℓ}.

Formula formula_43: T p1•p ′ 1 ,...,p ℓ •p ′ ℓ covers both T p1,...,p ℓ and T p ′ 1 ,...,p ′ ℓ . Proof. Denote p = (p 1 , . . . , p ℓ ), p ′ = (p ′ 1 , . . . , p ′ ℓ ), p = (p 1 , . . . , pℓ ) = (p 1 • p ′ 1 , . . . , p ℓ • p ′ ℓ ).

Formula formula_44: ρ(s) = s mod p, ρ ′ (s) = s mod p ′ ,(19)

Formula formula_45: s ∈ S and k ∈ {0, 1} ℓ such that k 1 + • • • + k ℓ = r. Since p < p, for every k ′ ≤ k: (s + k ′ mod p) mod p = (s mod p) + (k ′ mod p) = ρ(s) + k ′ mod p.(20)

Formula formula_46: • x ⊆ y ⇒ ρ(x) ⊆ ρ(y). • x, y ⊆ z ⇒ ρ(x), ρ(y) ⊆ ρ(z). • z ⊆ x, y ⇒ ρ(z) ⊆ ρ(x), ρ(y).

Formula formula_47: y ∈ N (x) ⇒ ρ(x) ̸ = ρ(y). (22

Formula formula_48: )

Formula formula_49: T p ′ 1 ,...,p ′ ℓ are ℓ-dimensional tori such that p 1 • • • p ℓ = p ′ 1 • • • p ′ ℓ (i.

Formula formula_50: ) 0 = p 1 • • • p ℓ = p ′ 1 • • • p ′ ℓ = (T p ′ 1 ,...,p ′ ℓ ) 0 ),

Formula formula_51: diam (co)Ar 1 ,r 2 (X ) = max x,x ′ ∈Xr 1 d (co)Ar 1 ,r 2 (x, x ′ ),(23)

Formula formula_52: diam k (co)Ar 1 ,r 2 (X ) = max x∈Xr 1 y∈X k min x ′ ⊆y d (co)Ar 1 ,r 2 (x, x ′ ).(24)

Formula formula_53: T p ′ 1 ,...,p ′ ℓ are ℓ-dimensional tori satisfying 1. p 1 • • • p ℓ = p ′ 1 • • • p ′ ℓ , 2. ∀j ∈ {1, . . . , ℓ}, p j , p ′ j ≥ 3, and 3. ℓ j=1 ⌊ pj 2 ⌋ ̸ = ℓ j=1 ⌊ p ′ j 2 ⌋, then diam A0,1 (T p1,...,p ℓ ) ̸ = diam A0,1 (T p ′ 1 ,...,p ′ ℓ )(25)

Formula formula_54: diam A0,1 (T p1,...,p ℓ ) = ℓ j=1 diam(Cyc(p j )) = ℓ j=1 p j 2 ̸ = ℓ j=1 p ′ j 2 = ℓ j=1 diam(Cyc(p ′ j )) = diam A0,1 (T p ′ 1 ,...,p ′ ℓ ).(27)

Formula formula_55: T ′ = T p 1 1 ,...,p 1 ℓ ⊔ T p 2 1 ,...,p 2 ℓ be a disjoint union of two disconnected tori. If p 1 • • • p ℓ = p 1 1 • • • p 1 ℓ + p 2 1 • • • p 2

Formula formula_56: ̸ = H r (T ′ ), b r (T ) ̸ = b r (T ′ ).

Formula formula_57: 2 , H r (T ′ ) = H r (T 1 ) × H r (T 2 ) = Z ( ℓ r ) × Z ( ℓ r ) = Z 2( ℓ r ) . Therefore, ∀r ∈ {0, . . . , ℓ}, H r (T ) ̸ = H r (T ′ ) and b r (T ) = ℓ r ̸ = 2 ℓ r = b r (T ′ ).

Formula formula_58: Z 2 → Z 2 by ρ h,p cyl (s) = (s 1 , s 2 mod p) (28) ρ h,p möb (s) = s 1 , s 2 mod r s 2 mod 2p ≤ p (h + 1 -s 1 , s 2 mod r) s 2 mod 2p > p. (29

Formula formula_59: )

Formula formula_60: S = [h] × [p],(30)

Formula formula_61: X r = {s k | s ∈ S, k ∈ {0, 1} 2 , k 1 + k 2 = r, ρ h,p cyl (s + k) ∈ S},(31)

Formula formula_62: X = X 0 ∪ X 1 ∪ X 2 , (32

Formula formula_63: )

Formula formula_64: s k = {ρ h,p cyl (s + k ′ ) | k ′ ∈ {0, 1} 2 , k ′ ≤ k}. (33

Formula formula_65: )

Formula formula_66: S = [h] × [p],(34)

Formula formula_67: X r = {s k | s ∈ S, k ∈ {0, 1} 2 , k 1 + k 2 = r, ρ h,p möb (s + k) ∈ S}, (35) X = X 0 ∪ X 1 ∪ X 2 ,(36)

Formula formula_68: s k = {ρ h,p möb (s + k) | k ′ ∈ {0, 1} 2 , k ′ ≤ k}. (37

Formula formula_69: )

Formula formula_70: k 1 + k 2 = r. For every k ′ ≤ k ρ(ρ cyl h,2p (s + k ′ )) = ρ cyl h,p (ρ(s) + k ′ ), (38

Formula formula_71: ) so ρ(s k ) = ρ(s) k . Additionally, ρ ′ (ρ h,2p cyl (s + k ′ )) = ρ h,p möb (ρ ′ (s) + k ′ ) s1 ≤ p ρ h,p möb (ρ ′ (s) + (-k ′ 1 , k ′ 2 )) s1 > p. (39

Formula formula_72: ) so ρ ′ (s k ) = ρ ′ (s) k s1 ≤ p (ρ ′ (s) + (-1, 0)) k s1 > p. (40

Formula formula_73: )

Formula formula_74: • x ⊆ y ⇒ ρ(x) ⊆ ρ(y) and ρ ′ (x) ⊆ ρ ′ (y). • x, y ⊆ z ⇒ ρ(x), ρ(y) ⊆ ρ(z) and ρ ′ (x), ρ ′ (y) ⊆ ρ ′ (z) • z ⊆ x, y ⇒ ρ(z) ⊆ ρ(x), ρ(y) and ρ ′ (z) ⊆ ρ ′ (x), ρ ′ (y).

Formula formula_75: G = (V, E) is a combinatorial complex denoted by 3 -CL(G), with S = V, X 0 = {{ v} | v ∈ V}, X 1 = E, and X 2 = {{x, y, z} | x ∼ y, x ∼ z, y ∼ z}.

Formula formula_76: ρ(a i ) = a ′ i mod n•k ρ(b i ) = b ′ i mod k .(41)

Formula formula_77: u ∼ G v ⇒ ρ(u) ∼ G * ρ(v).(42)

Formula formula_78: rk(x) =    0 x ∈ V 1 x ∈ E 2 x ∈ V MOG .

Formula formula_79: x i,n = x i × V n and x ′ i,n = x ′ i × V n respectively for all i ∈ [3]. Defining X n = MOG(G n ), X ′ n = MOG(G ′ n )

Formula formula_80: ρ n : S × V n → S × V n by ρ n (s, v) = (ρ(s), v) and ρ ′ n : S × V n → S ′ × V n by ρ ′ n (s, v) = (ρ ′ (s), v)

Formula formula_81: d Gn ((s, v), (s ′ , v ′ )) = d G (s, s ′ ) + d Cyc(n) (v, v ′ ). (43

Formula formula_82: )

Formula formula_83: diam 2 A0,1 (X n ) = diam 2 A0,1 (X ) = 3. The same reasoning shows that diam 2 A0,1 (X ′ n ) = diam 2 A0,1 (X ′ ) = 2 concluding the proof.

Formula formula_84: C k (X , R d ) = {h k | h k : X k0 0 × • • • × X k ℓ ℓ → R d }.(44)

Formula formula_85: X i → R d , i.e. C k (X , R d

Formula formula_86: h(x, y) = 1 y ∈ B r1,r2 (x) 0 otherwise. (45

Formula formula_87: )

Formula formula_88: h(x, y) = 1 y ∈ (co)A r1,r2 (x) 0 otherwise. (46

Formula formula_89: )

Formula formula_90: X k = X k0 0 × • • • × X k ℓ ℓ . The group S n0 × • • • × S n ℓ is denoted by G.

Formula formula_91: C k (X , R d ) → C k ′ (X , R d ′ ) for each pair of tuples k, k ′ . Since the space C k can be identified with R n k 0 0 ×•••×n k ℓ

Formula formula_92: C k ⊗ C k ′ = R n k 0 0 ×•••×n k ℓ ℓ ×d× k ′ 0 0 ×•••×n k ′ ℓ ℓ ×d ′ . (47

Formula formula_93: )

Formula formula_94: X k × [d] × X k ′ × [d ′ ].

Formula formula_95: j 1 ∈ [d 1 ], j 2 ∈ [d 2 ], we define a matrix B γ,j1,j2 ∈ R n k 0 0 ×•••×n k ℓ ℓ ×d× k ′ 0 0 ×•••×n k ′ ℓ ℓ ×d ′ : B γ,j1,j2 a,i1,b,i2 = 1 (a, b) ∈ γ, i 1 = j 1 , i 2 = j 2 0 otherwise. (48

Formula formula_96: )

Formula formula_97: ∈ [n 0 ] k0 × • • • × [n l ] k l , b ∈ [n 0 ] k ′ 0 × • • • × [n l ] k ′ l i 1 ∈ [d], i 2 ∈ [d ′ ].

Formula formula_98: h ′ (b) j = (a,b)∈γ h(a) j1 j = j 2 0 otherwhise. (49

Formula formula_99: )

Formula formula_100: R n k 0 0 ×•••×n k ℓ ℓ ×d → R n k ′ 0 0 ×•••×n k ′ ℓ ℓ ×d in Maron et al. (

Formula formula_101: C k → C k ′

Formula formula_102: F (h) = β   γ,j1,j2 w γ,j1,j2 B γ,j1,j2 h   . (50

Formula formula_103: )

Formula formula_104: MCN tensor diagram, if v is a node labeled by C k we compute a multi-cellular cochain h (v) ∈ C k by h (v) x = u∈pred(v) m u,v (x)(51)

Formula formula_105: y∈N (x) MLP u,v (h (u) x , h (u) y ). (52

Formula formula_106: ) of O(d • n 0 • n 1 ) and O(d • n 0 • n 2 ).

Formula formula_107: 1. rk(x) = rk ′ (ρ(x)) ∀x ∈ X , 2. x ⊆ y ⇒ ρ(x) ⊆ ρ(y) ∀x, y ∈ X .

Formula formula_108: M(X ) ̸ = M(X ′ ). (56

Formula formula_109: )

Formula formula_110: V = X , (57) V ′ = X ′ , (58) E = {(x, y) ∈ X × X | x ⊆ y, rk(x) = rk(y) -1},(59)

Formula formula_111: E ′ = {(x ′ , y ′ ) ∈ X ′ × X ′ | x ′ ⊆ y ′ , rk ′ (x ′ ) = rk ′ (y ′ ) -1}.(60

Formula formula_112: A r1,r2 = 0 ni×nj r 1 ̸ = r 2 + 1 B r1,r2 r 1 = r 2 + 1,(61)

Formula formula_113: Q := × ℓ r1=0,r2=0 C er 1 +er 2 (X , R).

Formula formula_114: Q ⊗k → Q ⊗k ′

Formula formula_115: • • • + n ℓ-1 + 1, . . . , n 0 + • • • + n ℓ }; G ∼ = S n0 × • • • × S n ℓ ⊆ [n].

Formula formula_116: Q ⊗k → Q ⊗k ′ .

Formula formula_117: T (h)(x 0 , . . . x ℓ ) = ℓ r1=0,r2=0 h r1,r2 (x r1 , x r2 ),(62)

Formula formula_118: C k•1 ℓ+1 (X , R (ℓ+1) 2 ) → C k ′ •1 ℓ+1 (X , R(

Formula formula_119: X of dimension ≥ r 1 , r 2 , r, M(H (co)Ar 1 ,r 2 ) = M ′ (X ).

Formula formula_120: MLP r,r ′ (x, y) = MLP(x, y) if r = r 2 and r ′ = r 1 , 0 otherwise(63)

Formula formula_121: Proposition F.3 (SMCN can compute diameters). If X , X ′ are CCs such that diam r Ar 1 ,r 2 (X ) ̸ = diam r Ar 1 ,r 2 (X ′ ),(64)

Formula formula_122: h (T ) u,v = d G (u, v) for u, v ∈ V.(65)

Formula formula_123: (T )

Formula formula_124: (T ) S,v = d G (S, v) for v ∈ V and S ∈ V * .(66)

Formula formula_125: (T )

Formula formula_126: M(Cyl h,p ) ̸ = M(Möb h,p ).(67)

Formula formula_127: h (1) (x) = deg B1,2 (x).(68)

Formula formula_128: h (2) x1,x2 = deg B1,2 (x 1 ) ∥ deg B1,2 (x 2 ),(69)

Formula formula_129: h (3) x1,x2 = (h coA1,0 ) x1,x2 ∥ deg B1,2 (x 1 ) ∥ deg B1,2 (x 2 ).(70)

Formula formula_130: (4) x1,x2 = MLP(h (3) x1,x2

Formula formula_131: MLP(a, b, c) = 1 a = b = c = 1 0 otherwise.(71)

Formula formula_132: h (t) u,v = d G (u, v) for u, v ∈ V.(72)

Formula formula_133: g 1 (x) = 0 if x = -1, 1 if x ≥ -1 2 .(73)

Formula formula_134: (t)

Formula formula_135: h (t+1) u,v = 0 if v / ∈ G u , 1 if v ∈ G u .(74)

Formula formula_136: (t+2) u,v = v ′ ∈V h (t+1) u,v ′ , to get h (t+2) u,v = |G u |. (75

Formula formula_137: ) Define g 2 [1, |V|] → R to be g 2 (x) = 1 x .(76)

Formula formula_138: (t+2) u,v

Formula formula_139: h (t+3) u,v = 1 |G u | . (77

Formula formula_140: )

Formula formula_141: (T ) u,v by h out = u,v∈V h (T ) u,v .(78)

Formula formula_143: h out = u,v∈V 1 |G u | = G * ∈C(G) u∈G * |V| |G * | = G * ∈C(G) |V| = |V||C(G)|.(79)

Formula formula_144: χ(M) = k 2 -k 1 + k 0 = |X 2 | -|X 1 | + |X 0 |. (80

Formula formula_145: )

Formula formula_146: h x = 1 x

Formula formula_147: m u,v (x, y) = ℓ r=0,r ′ =0 MLP r,r ′ h (t) x,y , h(t)

Formula formula_149: O(m deg • n 0 • n 2 ). Thus the overall runtime of our SMCN model is O(m deg • n 0 • n 2 • T ), completing the proof.

Formula formula_150: C 0 C 1 C 2 A0,1 B0,1 A1,2 B1,2 Id Figure 12: CIN Block.

Formula formula_151: (h (0) 2 ) x = (h (0) 0 ) B ⊤ 2,0 (x) = x ′ ∈B ⊤ 2,0 (x) (h (0) 0 ) x ′ . (82

Formula formula_152: )

Formula formula_153: (t) 0 , h(t)

Formula formula_154: h (t+1) x = MLP (t) 0,1   (1 + ϵ 0 )h (t) x + x ′ ∈A0,1(x) MLP (t) 0,2 h (t) x ′ h (t) E(x,x ′ )   ,(83)

Formula formula_155: for x ∈ X 1 , h (t+1) x = Cat   (1 + ϵ 1 1 )h (t) x + h (t) B ⊤ 1,0 (x) (1 + ϵ 2 1 )h (t) x + x ′ ∈A1,2(x) MLP (t) 0,2 h (t) x ′ h (t) E(x,x ′ )   (84) and for x ∈ X 2 , h(t+1)

Formula formula_156: x = MLP (t) 2 (1 + ϵ 2 )h (t) x + h (t)1 B ⊤ 2,1 (x) ,(85)

Formula formula_157: if x, x ′ ∈ X r , E(x, x ′ ) = {y ∈ X r+1 | x, x ′ ⊆ y} and ϵ 0 , ϵ 1 1 , ϵ 2 1 , ϵ 2 are non-learnable hyperparameters. C 0 C 1 C 2 A0,1 B0,2 B ⊤2

Formula formula_158: x = Cat (1 + ϵ 1 0 )h (t) x + h (t) B0,2(x) (1 + ϵ 2 0 )h (t) x + h (t) A0,1(x) (86) for x ∈ X 1 , h (t+1) x = h (t) x , (87) and for x ∈ X 2 h (t+1) x = MLP (t) (1 + ϵ 2 )h (t) x + h (t) B ⊤ 2,0 (x) .(t+1)

Formula formula_160: (t) 0,r ) x1,x2 = MLP (t) r,1 ((h (t) 0 ) x1 ) + MLP (t) r,2 ((h (t) r ) x2 ) + MLP (t) r,3 (mark(x 1 , x 2 )),(89)

Formula formula_161: mark B (x 1 , x 2 ) = 1 x 1 ∈ B 0,r (x 2 ) 0 otherwise.(90)

Formula formula_162: D (x 1 , x 2 ) = min x∈B ⊤ r,0 (x2) d A0,1 (x 1 , x),(91)

Formula formula_163: h (t+1) x1,x2 = Cat 1   (1 + ϵ 1 1 )h (t+1) x1,x2 + x ′ ∈A0,1(x1) MLP (t) 1 h (t) x ′ ,x2 h (t) E(x1,x ′ ) (1 + ϵ 2 1 )h (t) x1,x2 + h (t) x1,B0,1(x1)   (92) for x 1 ∈ X 0 , x 2 ∈ X 1 , and h (t+1) x1,x2 = Cat 2   (1 + ϵ 1 2 )h (t+1) x1,x2 + x ′ ∈A0,1(x1) MLP (t) 2 h (t) x ′ ,x2 h (t) E(x1,x ′ ) (1 + ϵ 2 2 )h (t) x1,x2 + h (t) x1,B0,2(x1)   (93) for x 1 ∈ X 0 , x 2 ∈ X 2 , respectivley.

Formula formula_164: (h 0 ) (t) x = MLP (t) 0 (h (t) 0,r ) x,Xr = MLP (t) 0 x ′ ∈Xr (h (t) 0,r ) x,x ′ (h r ) (t) x = MLP (t) r (h (t) 0,r ) X0,x = MLP (t) r x ′ ∈X0 (h (t) 0,r ) x ′ ,x(94)

Formula formula_165: h out = MLP agg 0 { {h (T ) x | x ∈ X 0 } } + agg 1 { {h (T ) x | x ∈ X 1 } } + agg 2 { {h (T ) x | x ∈ X 2 } } + agg 3 { {h (T ) x1,x2 | x 1 ∈ X 0 , x 2 ∈ X 1 } } + agg 4 { {h (T ) x1,x2 | x 1 ∈ X 0 , x 2 ∈ X 2 } } , (95

Formula formula_166: )

Formula formula_167: max x∈X0, y∈X2 min x ′ ∈y d A0,1 (x, x ′ )(96)

