Dynamic and Chemical Constraints to Enhance the Molecular Masked Graph Autoencoders

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: molecule representation learning, graph auto-encoder, information bottleneck
TL;DR: We boost masked graph autoencoders for molecules by introducing DyCC, a dynamic, chemistry-aware training framework with adaptive masking (GIBMS) and soft label reconstruction (SLG).
Abstract: Masked Graph Autoencoders (MGAEs) have gained significant attention recently. Their proxy tasks typically involve random corruption of input graphs followed by reconstruction. However, in the molecular domain, two main issues arise: the predetermined mask ratio and reconstruction objectives can lead to suboptimal performance or negative transfer due to overly simplified or complex tasks, and these tasks may deviate from chemical priors. To tackle these challenges, we propose Dynamic and Chemical Constraints (DyCC) for MGAEs. This includes a masking strategy called GIBMS, which preserves essential semantic information during graph masking while adaptively adjusting the mask ratio and content for each molecule. Additionally, we introduce a Soft Label Generator (SLG) that reconstructs masked tokens as learnable prototypes (soft labels) rather than hard labels. These components adhere to chemical constraints and allow dynamic variation of proxy tasks during training. We integrate the model-agnostic DyCC into various MGAEs and conduct comprehensive experiments, demonstrating significant performance improvements. Our code is available at \url{https://github.com/forever-ly/DyCC}.
Primary Area: Machine learning for sciences (e.g. climate, health, life sciences, physics, social sciences)
Submission Number: 9290
Loading