Modeling All-Atom Glycan Structures via Hierarchical Message Passing and Multi-Scale Pre-training

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: This work proposes an all-atom-wise glycan encoder GlycanAA and a pre-trained version of it PreGlycanAA.
Abstract: Understanding the various properties of glycans with machine learning has shown some preliminary promise. However, previous methods mainly focused on modeling the backbone structure of glycans as graphs of monosaccharides (i.e., sugar units), while they neglected the atomic structures underlying each monosaccharide, which are actually important indicators of glycan properties. We fill this blank by introducing the GlycanAA model for All-Atom-wise Glycan modeling. GlycanAA models a glycan as a heterogeneous graph with monosaccharide nodes representing its global backbone structure and atom nodes representing its local atomic-level structures. Based on such a graph, GlycanAA performs hierarchical message passing to capture from local atomic-level interactions to global monosaccharide-level interactions. To further enhance model capability, we pre-train GlycanAA on a high-quality unlabeled glycan dataset, deriving the PreGlycanAA model. We design a multi-scale mask prediction algorithm to endow the model about different levels of dependencies in a glycan. Extensive benchmark results show the superiority of GlycanAA over existing glycan encoders and verify the further improvements achieved by PreGlycanAA. We maintain all resources at https://github.com/kasawa1234/GlycanAA.
Lay Summary: In this project, we teach the machine about the concept of glycans. Glycans are a very important kind of biomolecules in our life playing various biological functions. Basically, a glycan performs its functions by the atoms composing it. Therefore, we study how a machine can understand glycan functions by modeling their atoms. We develop the GlycanAA model for modeling the atomic-level structures of glycans. Furthermore, we endow GlycanAA with the knowledge underlying abundant unlabeled glycan structures, deriving the PreGlycanAA model.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/kasawa1234/GlycanAA
Primary Area: Applications->Chemistry, Physics, and Earth Sciences
Keywords: Glycan Machine Learning, Heterogeneous Graph Modeling, Self-Supervised Pre-training
Submission Number: 1967
Loading