Keywords: Pretraining, Graph, Self, Supervised, Non, Contrastive, SSL, Self, Predictive, Molecular, Foundation, Model
TL;DR: We propose C-FREE, a simple self-supervised framework that learns molecular representations by predicting subgraph embeddings, achieving state-of-the-art performance without negatives, augmentations, or complex objectives.
Abstract: High-quality molecular representations are essential for property prediction and molecular design, yet large labeled datasets remain scarce. Self-supervised pretraining on molecular graphs has shown promise, but existing approaches often rely on costly negative sampling, hand-crafted augmentations, or complex generative and latent prediction objectives. We introduce C-FREE (Contrast-Free Representation learning on Ego-nets), a simple and effective framework that learns molecular representations by predicting subgraph embeddings from their complementary neighborhoods in the latent space. Motivated by the success of subgraph-based methods in supervised learning, C-FREE adopts fixed-radius ego-nets as the basic modeling unit and trains a hybrid Graph Neural Network (GNN)-Transformer backbone without negatives, positional encodings, or expensive pre-processing. Pretrained on the GEOM dataset, C-FREE achieves state-of-the-art performance on MoleculeNet, outperforming contrastive, generative, and more complex latent self-supervised learning techniques. Fine-tuning on the Kraken dataset further shows that pretraining on GEOM transfers effectively to new chemical domains, providing clear benefits over training from scratch.
Submission Number: 31
Loading