Keywords: semi-supervised learning, self-supervised learning, ensemble, gnn, graph neural networks, ensemble distilling, geometric graph neural network
Abstract: Machine learning is transforming molecular sciences by accelerating property prediction, simulation, and the discovery of new molecules and materials. Acquiring labeled data in these domains is often costly and time consuming, whereas large collections of unlabeled molecular data are readily available. Semi-supervised learning (SSL) can exploit such unlabeled data, but standard SSL methods often rely on label-preserving augmentations, which are challenging to design in the molecular domain, where minor changes can drastically alter properties. In this work, we propose an augmentation-free SSL method for regression and classification. Grounded in ensemble learning, our approach introduces a consistency loss that penalizes disagreements with the ensemble consensus. We demonstrate that this training procedure boosts the predictive accuracy of both the ensemble and its individual models across diverse datasets, tasks, and architectures.
Submission Track: Paper Track (Short Paper)
Submission Category: All of the above
Institution Location: {Copenhagen, Denmark}
Submission Number: 23
Loading