Towards Multiscale Graph-based Protein Learning with Geometric Secondary Structural Motifs

Shih-Hsin Wang; Yuhao Huang; Taos Transue; Justin M. Baker; Jonathan Forstater; Thomas Strohmer; Bao Wang

Towards Multiscale Graph-based Protein Learning with Geometric Secondary Structural Motifs

Shih-Hsin Wang, Yuhao Huang, Taos Transue, Justin M. Baker, Jonathan Forstater, Thomas Strohmer, Bao Wang

Published: 18 Sept 2025, Last Modified: 07 Jan 2026NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multiscale GNNs, Primary and secondary structures, Proteins, AI for Science

TL;DR: We propose an efficient multiscale graph-based learning framework tailored to proteins.

Abstract: Graph neural networks (GNNs) have emerged as powerful tools for learning protein structures by capturing spatial relationships at the residue level. However, existing GNN-based methods often face challenges in learning multiscale representations and modeling long-range dependencies efficiently. In this work, we propose an efficient multiscale graph-based learning framework tailored to proteins. Our proposed framework contains two crucial components: (1) It constructs a hierarchical graph representation comprising a collection of fine-grained subgraphs, each corresponding to a secondary structure motif (e.g., $\alpha$-helices, $\beta$-strands, loops), and a single coarse-grained graph that connects these motifs based on their spatial arrangement and relative orientation. (2) It employs two GNNs for feature learning: the first operates within individual secondary motifs to capture local interactions, and the second models higher-level structural relationships across motifs. Our modular framework allows a flexible choice of GNN in each stage. Theoretically, we show that our hierarchical framework preserves the desired maximal expressiveness, ensuring no loss of critical structural information. Empirically, we demonstrate that integrating baseline GNNs into our multiscale framework remarkably improves prediction accuracy and reduces computational cost across various benchmarks.

Supplementary Material: zip

Primary Area: Machine learning for sciences (e.g. climate, health, life sciences, physics, social sciences)

Submission Number: 11482

Loading