Active Learning for Molecular Conformation Optimization with a Domain-Agnostic Neural Surrogate Oracle

ICLR 2026 Conference Submission25013 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: energy minimization, conformational optimization, geometry optimization, graph neural networks, neural network potentials, active learning
TL;DR: We propose a data-efficient active learning framework for conformational energy minimization with neural network potentials and domain-agnostic trainable neural surrogate oracle
Abstract: Molecular conformation optimization is crucial to computer-aided drug discovery and materials design, yet conventional force-based minimization with physics oracles (e.g., DFT) is prohibitively expensive. Neural network potentials (NNPs) are capable of accelerating this process but typically require large quantum chemical datasets for training. To reduce data requirements, active learning (AL) approaches have been designed for this task. The state-of-the-art approach, GOLF, relies on the surrogate oracle to sample new data. However, the surrogate oracle utilizes empirical molecular force fields, which necessitates careful domain-specific tuning and limits generality. We introduce a new AL method for efficient conformation optimization that removes the dependency on empirical force fields. Our approach maintains two NNPs: an online NNP that performs conformation optimization and a target NNP that serves as a trainable surrogate oracle. The target network is an exponential-moving-average of the online network. During active sampling, the target NNP supplies potential energy estimates that guide data acquisition, while periodic queries to the physics oracle provide ground-truth corrections. Unlike other AL approaches, our method does not require architectural changes to NNP and adds minimal computational overhead compared to the single-model AL pipelines. Across two challenging conformation-optimization benchmarks spanning different DFT levels, our method consistently outperforms a baseline NNP trained without AL, achieving substantial improvements with only ~1,000 additional conformations.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 25013
Loading