FIT-GNN: Faster Inference Time for GNNs that ‘FIT’ in Memory Using Coarsening

Shubhajit Roy; Hrriday Ruparel; Kishan Ved; Anirban Dasgupta

FIT-GNN: Faster Inference Time for GNNs that ‘FIT’ in Memory Using Coarsening

Shubhajit Roy, Hrriday Ruparel, Kishan Ved, Anirban Dasgupta

Published: 27 Mar 2026, Last Modified: 27 Mar 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Scalability of Graph Neural Networks (GNNs) remains a significant challenge. To tackle this, methods like coarsening, condensation, and computation trees are used to train on a smaller graph, resulting in faster computation. Nonetheless, prior research has not adequately addressed the computational costs during the inference phase. This paper presents a novel approach to improve the scalability of GNNs by reducing computational burden during the inference phase using graph coarsening. We demonstrate two different methods -- Extra Nodes and Cluster Nodes. Our study extends the application of graph coarsening for graph-level tasks, including graph classification and graph regression. We conduct extensive experiments on multiple benchmark datasets to evaluate the performance of our approach. Our results show that the proposed method achieves orders of magnitude improvements in single-node inference time compared to traditional approaches. Furthermore, it significantly reduces memory consumption for node and graph classification and regression tasks, enabling efficient training and inference on low-resource devices where conventional methods are impractical. Notably, these computational advantages are achieved while maintaining competitive performance relative to baseline models.

Submission Type: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: ## Notation & Formulation Fixes - **Adjacency matrix generalized**: We changed the adjacency matrix to $A \in \mathbb{R}^{n \times n}$ (weighted), with $A_{ij}$ explicitly defined as the edge weight between nodes $v_i$ and $v_j$. - **Coarsening ratio formula tightened**: Changed number of coarsened nodes as $k = n \times r$ to $k = \lfloor n \times r \rfloor$. --- ## Expanded Analysis: Node Regression Performance We added an explanation for the counterintuitive result that FIT-GNN *outperforms* full-graph inference on node regression tasks. Two phenomena are identified: 1. **Homogeneous local contexts**: Partitioning the graph into subgraphs creates local contexts where label standard deviation is dramatically lower than the global variation. This presents a smoother optimization landscape during inference, allowing the model to specialize more effectively. 2. **Implicit adversarial pruning**: In heterophilic graphs, long-range 2nd-hop neighborhood information acts as noise or an adversarial signal rather than a useful context. The coarsening process implicitly filters out this distant noise, enabling the model to fully exploit the low-variance local structures and thereby reduce regression error. This explanation is supported by the new Appendix G (detailed below). --- ## New Appendix C.2 — Preprocessing Time Complexity A comparative analysis of preprocessing overhead against other state-of-the-art scaling methods was added. - **Table 9** compares asymptotic complexities across methods. - **Table 10** analyzes inference strategies when a new test node $v$ is added to graph $G$. --- ## New Appendix G — Detailed Ablation Study on Node Regression A comprehensive three-part ablation study on the Crocodile, Chameleon, and Squirrel datasets is added to explain the gains in node regression performance. ### G.1 — Impact of Inference Input vs. Training Regime A controlled experiment on the Crocodile dataset isolates whether the performance gain stems from the training methodology or the inference input structure. | Train Setup | Inference Setup | MAE | |-------------|----------------|-----| | Full Graph | Full Graph | 0.852 | | Subgraphs | Full Graph | 0.865 | | Subgraphs | Subgraphs | 0.364 | Training on subgraphs alone does not account for the MAE reduction — the substantial improvement only emerges when subgraphs are used as the *inference input*, confirming that the local structural input (not the training regime) drives the improvement. ### G.2 — Subgraph Optimization Landscape Label variation was measured locally (within subgraphs) versus globally to quantify the homogeneity hypothesis. | Dataset | Metric | Global Variation | Subgraph Variation (Avg) | |---------|--------|-----------------|--------------------------| | Cora | Entropy | 1.8311 | 0.1245 | | Citeseer | Entropy | 1.7533 | 0.1572 | | Chameleon | Std Dev | 2.1329 | 0.0689 | | Squirrel | Std Dev | 1.7639 | 0.1284 | Label variation within individual subgraphs is dramatically lower than global variation, confirming that coarsening creates statistically more homogeneous local contexts. ### G.3 — Structural Information Loss and Implicit Adversarial Pruning Histograms of the fraction of 2nd-hop neighborhood lost per node (at $r = 0.5$) in **Figure 7** reveal a stark contrast between task types: - **Classification datasets** (Cora, Citeseer): A significant fraction of nodes retain their full 2nd-hop neighborhood (lost fraction close to 0). - **Regression datasets** (Squirrel, Chameleon): The vast majority of nodes lose nearly all of their 2nd-hop neighborhood (lost fraction close to 1). Given FIT-GNN's superior performance on regression tasks, this structural loss is interpreted as **implicit adversarial pruning** — in heterophilic graphs, long-range 2nd-hop information introduces noise, and coarsening implicitly filters it out, allowing the model to exploit the low-variance local structures identified in G.2.

Code: https://github.com/Roy-Shubhajit/FIT-GNN

Assigned Action Editor: ~Feng_Zhou9

Submission Number: 6825

Loading