Abstract: Optimizing the structure of molecules to achieve
desired properties is a central bottleneck across
the chemical sciences, particularly in the pharmaceutical industry where it underlies the discovery
of new drugs. Since molecular property evaluation often relies on costly and rate-limited oracles, such as experimental assays, molecular optimization must be highly sample-efficient. To
address this, we introduce SEISMO, an LLM
agent that performs strictly online, inference-time
molecular optimization, updating after every oracle call without the need for population-based or
batched learning. SEISMO conditions each proposal on the full optimization trajectory, combining natural-language task descriptions with scalar
scores and, when available, structured explanatory feedback. Across the Practical Molecular
Optimization benchmark of 23 tasks, SEISMO
achieves a 2–3 times higher area under the optimisation curve than prior methods, often reaching
near-maximal task scores within 50 oracle calls.
Our additional medicinal-chemistry tasks show
that providing explanatory feedback further improves efficiency, demonstrating that leveraging
domain knowledge and structured information is
key to sample-efficient molecular optimization.
Loading