Keywords: Machine Learning, Large language models, Chemistry, Retrosynthesis, Search
Abstract: Large language models (LLMs) have been the focal point of enormous development in artificial intelligence over the past half decade, recently achieving human level performance on mathematics and programming benchmarks. In-spite of this, performance improvements on chemical tasks have emerged at a somewhat slower pace. In this work we investigate the capabilities of large language models (LLMs) in chemical search to address two central problems in AI-driven synthesis: retrosynthetic planning and mechanism elucidation. In our approach, the search environment builds options and the LLM serves as a guidance function to evaluate the validity and potential of a partially constructed solution. This is advantageous as LLMs can digest arbitrary inputs and optimize for arbitrary requirements. In this work, we show that LLMs can analyze and reason about chemical entities like molecules and reactions. We then leverage these capabilities in the context of two central problems in organic chemistry: retrosynthetic planning and mechanistic elucidation. Our results show that LLMs can accurately reason about chemical entities in both local and global terms, analyzing single reactions but also whole synthetic routes, and that such capabilities can be exploited through search algorithms for solving chemical problems in more flexible terms.
Submission Number: 174
Loading