TL;DR: we propose a novel framework, SpaceFormer, which incorporates additional 3D space beyond atoms to enhance molecular pretrained representation ability
Abstract: Molecular pretrained representations (MPR) has emerged as a powerful approach for addressing the challenge of limited supervised data in applications such as drug discovery and material design.
While early MPR methods relied on 1D sequences and 2D graphs, recent advancements have incorporated 3D conformational information to capture rich atomic interactions. However, these prior models treat molecules merely as discrete atom sets, overlooking the space surrounding them. We argue from a physical perspective that only modeling these discrete points is insufficient. We first present a simple yet insightful observation: naively adding randomly sampled virtual points beyond atoms can surprisingly enhance MPR performance. In light of this,
we propose a principled framework that incorporates the entire 3D space spanned by molecules. We implement the framework via a novel Transformer-based architecture, dubbed SpaceFormer, with three key components:
(1)grid-based space discretization; (2)grid sampling/merging; and (3)efficient 3D positional encoding.
Extensive experiments show that SpaceFormer significantly outperforms previous 3D MPR models across various downstream tasks with limited data, validating the benefit of leveraging the additional 3D space beyond atoms in MPR models.
Lay Summary: Molecular pretrained representations has emerged as a powerful approach for addressing the challenge of limited supervised data in applications such as drug discovery and material design. Early models for these tasks analyze molecules as simple strings of atoms or flat diagrams, while newer approaches study their 3D shapes (including atom types and coordinates). However, even these advanced methods focus only on the atoms themselves, ignoring the empty spaces around them.
We argue from a physical perspective that only modeling these discrete atomic points is insufficient. We first present a simple yet insightful observation: naively adding randomly sampled points beyond atoms to represent the space can surprisingly improved performance. In light of this,
we propose a principled framework, SpaceFormer, which incorporates the entire 3D space spanned by molecules rather than isolated atoms. Extensive experiments show that SpaceFormer significantly outperforms existing models across various downstream tasks with limited data, validating the benefit of leveraging the additional 3D space beyond atoms for molecular pretrained representations.
Primary Area: Applications->Chemistry, Physics, and Earth Sciences
Keywords: Molecular pretrained representation, Molecular Property
Submission Number: 5623
Loading