Keywords: Benchmark, Adaptive Design, Inverse Design, Generative Models, Agents, Materials Science, Materials Discovery
TL;DR: We propose a dynamic benchmark environment that evaluates autonomous agents in closed-loop discovery of thermodynamically stable materials.
Abstract: Existing benchmarks for computational materials discovery primarily evaluate static predictive tasks or isolated computational sub-tasks. Such approaches inadequately capture the inherently iterative, exploratory, and often serendipitous nature of scientific discovery.
We argue that the research community should shift evaluation practices towards including dynamic benchmarks that more realistically represent materials discovery campaigns. As a concrete example, we propose an open-ended benchmark environment designed to simulate closed-loop discovery, requiring autonomous agents or algorithms to iteratively propose, evaluate, and refine candidates under a constrained evaluation budget. Specifically, it targets the efficient discovery of new thermodynamically stable compounds within chemical systems. Multiple fidelity levels are accommodated, from machine-learned interatomic potentials to density functional theory and experimental validation. This approach emphasizes realistic elements of scientific discovery, such as iterative refinement, adaptive decision-making, handling uncertainty and traversing unknown chemical landscapes.
Submission Track: Benchmarking in AI for Materials Design - Short Paper
Submission Category: AI-Guided Design
AI4Mat Journal Track: Yes
AI4Mat RLSF: Yes
Submission Number: 139
Loading