Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving

28 Sept 2024 (modified: 16 Oct 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Chemistry, LLMs, Agent, AI4Science, AI4Chemistry, Tool-Augmented LLMs
TL;DR: Comprehensive evaluation of LLM agent on chemistry tasks and reveal findings and challenges.
Abstract: Large language models (LLMs) have shown promise in various domains but face challenges in chemistry due to limited domain knowledge and computational capabilities. To address these issues, tool-augmented language agents like ChemCrow and Coscientist have been developed. However, their evaluations remain narrow in scope, leaving an unclear understanding of how these tool-augmented agents perform across various real-world applications. In this study, we conduct a comprehensive evaluation to bridge this gap. Specifically, we develop ChemAgent, the most capable chemistry agent to date, equipped with 29 tools capable of handling a wide spectrum of tasks. We then conduct a comprehensive assessment across three datasets, namely SMolInstruct, MMLU-chemistry, and GPQA-chemistry, which can be categorized into specialized chemistry tasks and general chemistry questions. Surprisingly, tool-augmented agents do not consistently outperform the base LLM without tools, and the impact of tool augmentation is highly task-dependent: It provides substantial gains in specialized chemistry tasks but potentially hinders performance in general chemistry questions. We further engage domain experts and conduct error analysis, revealing that errors in general chemistry questions primarily occur due to minor inaccuracies at intermediate stages of the problem-solving process, highlighting the need for further research into balancing tool use with intrinsic reasoning abilities, to maximize the effectiveness of language agents in chemistry.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12696
Loading