Algorithmic Complexity Predicts when Path Information Im- proves Graph Neural Networks Performance on Molecular Graphs
Abstract: Graph Neural Networks (GNNs) are designed to process irregular relational data in rec-
ommendation systems, protein networks, social networks, and molecules. GNNs typically
rely on message passing and aggregation, with some architectures incorporating graph path
information in a bid to improve accuracy. However, it is unclear whether such incorporation
of path information truly improves GNN accuracy in all cases. As a first step, we herein
shed light on this issue for the case of molecular graphs. We evaluated Graphormer and
Mix-Hop models, with and without path information on 36 molecular datasets, derived from
six MoleculeNet benchmark datasets. Path information improved performance in some cases
but not in other cases. This finding is important, because these two models always incor-
porate path information in practice, whereas the finding shows this incorporation of path
information can actually be detrimental to the models’ accuracies. To more deeply probe
this observation, we developed a graph representation model called T-hop which allows us
to further highlight the use, versus non-use, of path information. On one hand, we formu-
late the Path Usefulness Measure (PUM) to quantify the benefit of path information. On
the other hand, we quantified the randomness of the different datasets via their algorithmic
complexities, using the Block Decomposition Method (BDM). We hypothesized, and con-
firmed our hypothesis, that: GNN models trained on molecular datasets with less random
structures (i.e. lower algorithmic complexity) should benefit from path information (i.e.
larger PUM), compared to datasets with more random structures. In summary, low algo-
rithmic complexity, which captures the presence of structure in molecular graphs, is useful
for predicting when path information improves accuracies in GNNs. A practical benefit of
this is that it leads to a more resource-efficient approach, wherein path information is only
incorporated for datasets with low algorithmic complexities.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: N/A
Assigned Action Editor: ~Giannis_Nikolentzos1
Submission Number: 6586
Loading