Keywords: computational chemistry, foundation models, embeddings, transformers, cheminformatics, representation learning, contrastive learning
TL;DR: The paper presents a new conformation-agnostic, task-agnostic dense embedding for molecular 3D structures obtained through contrastive learning.
Abstract: Recent years have seen a growing interest in machine learning approaches for chemical tasks. The best existing methods focus on building base models that combine molecular graphs (“2D structures”) with atomic coordinates in 3D to predict molecular properties, typically through pre-training followed by fine-tuning on benchmark datasets. However, current approaches require retraining the entire model for each prediction task, using published weights only as initialization. While this enables state-of-the-art performance, it limits practical deployment, as real-world datasets are often too small to support the stable retraining of large models. Importantly, the 3D geometry of a molecule holds crucial information for predicting its properties, but a single molecular graph usually corresponds to several 3D geometries, called conformers, introducing ambiguity into the inference process. Typical solutions rely on molecular graphs, but this approach is not easily generalizable beyond organic molecules. Here, we present ConforFormer, a method that explicitly accounts for the diversity of 3D conformations of a molecule to derive a task-agnostic and conformation-agnostic vector representation. This model serves as a foundational framework, producing embeddings that can be generated once and directly applied to downstream tasks, including property prediction and structural similarity, without extensive fine-tuning.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 9671
Loading