Learning the Neighborhood: Contrast-Free Self-Supervised Molecular Graph Pretraining

Boshra Ariguib; Mathias Niepert; Andrei Manolache

Learning the Neighborhood: Contrast-Free Self-Supervised Molecular Graph Pretraining

Boshra Ariguib, Mathias Niepert, Andrei Manolache

Published: 23 Sept 2025, Last Modified: 26 Oct 2025NPGML PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Pretraining, Graph, Self, Supervised, Non, Contrastive, SSL, Self, Predictive, Molecular, Foundation, Model

TL;DR: We propose C-FREE, a simple self-supervised framework that learns molecular representations by predicting subgraph embeddings, achieving state-of-the-art performance without negatives, augmentations, or complex objectives.

Abstract: High-quality molecular representations are essential for property prediction and molecular design, yet large labeled datasets remain scarce. Self-supervised pretraining on molecular graphs has shown promise, but existing approaches often rely on costly negative sampling, hand-crafted augmentations, or complex generative and latent prediction objectives. We introduce C-FREE (Contrast-Free Representation learning on Ego-nets), a simple and effective framework that learns molecular representations by predicting subgraph embeddings from their complementary neighborhoods in the latent space. Motivated by the success of subgraph-based methods in supervised learning, C-FREE adopts fixed-radius ego-nets as the basic modeling unit and trains a hybrid Graph Neural Network (GNN)-Transformer backbone without negatives, positional encodings, or expensive pre-processing. Pretrained on the GEOM dataset, C-FREE achieves state-of-the-art performance on MoleculeNet, outperforming contrastive, generative, and more complex latent self-supervised learning techniques. Fine-tuning on the Kraken dataset further shows that pretraining on GEOM transfers effectively to new chemical domains, providing clear benefits over training from scratch.

Submission Number: 31

Loading