Algorithmic Complexity Predicts when Path Information Im- proves Graph Neural Networks Performance on Molecular Graphs

TMLR Paper6586 Authors

20 Nov 2025 (modified: 24 Nov 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Graph Neural Networks (GNNs) are designed to process irregular relational data in rec- ommendation systems, protein networks, social networks, and molecules. GNNs typically rely on message passing and aggregation, with some architectures incorporating graph path information in a bid to improve accuracy. However, it is unclear whether such incorporation of path information truly improves GNN accuracy in all cases. As a first step, we herein shed light on this issue for the case of molecular graphs. We evaluated Graphormer and Mix-Hop models, with and without path information on 36 molecular datasets, derived from six MoleculeNet benchmark datasets. Path information improved performance in some cases but not in other cases. This finding is important, because these two models always incor- porate path information in practice, whereas the finding shows this incorporation of path information can actually be detrimental to the models’ accuracies. To more deeply probe this observation, we developed a graph representation model called T-hop which allows us to further highlight the use, versus non-use, of path information. On one hand, we formu- late the Path Usefulness Measure (PUM) to quantify the benefit of path information. On the other hand, we quantified the randomness of the different datasets via their algorithmic complexities, using the Block Decomposition Method (BDM). We hypothesized, and con- firmed our hypothesis, that: GNN models trained on molecular datasets with less random structures (i.e. lower algorithmic complexity) should benefit from path information (i.e. larger PUM), compared to datasets with more random structures. In summary, low algo- rithmic complexity, which captures the presence of structure in molecular graphs, is useful for predicting when path information improves accuracies in GNNs. A practical benefit of this is that it leads to a more resource-efficient approach, wherein path information is only incorporated for datasets with low algorithmic complexities.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: N/A
Assigned Action Editor: ~Giannis_Nikolentzos1
Submission Number: 6586
Loading