ChemMA: Multi-Agent Chemical Reasoning with Tools

ChemMA: Multi-Agent Chemical Reasoning with Tools

ACL ARR 2026 March Submission2259 Authors

17 Mar 2026 (modified: 07 Jun 2026)ACL ARR 2026 March SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retrieval-Augmented Generation, Multi-Agent Systems, Tool Learning, Chemical Reasoning

Abstract: Large language models show promise for chemistry tasks but often suffer from hallucination, inefficient retrieval, and invalid reasoning. Retrieval-Augmented Generation improves factual grounding, yet existing approaches rely on static retrieval and lack principled stopping criteria, leading to irrelevant context accumulation and poor multi-step reasoning. We present ChemMA, an agentic retrievalaugmented framework for chemical reasoning. ChemMA integrates a Hybrid Perception module that enforces symbolic grounding against noisy inputs, a Planner that orchestrates dependency-aware actions rooted in structured memory for gap-driven reasoning, and an Executor that interfaces with a heterogeneous toolchain while filtering retrieval noise via embedded knowledge distillation. To ensure scientific rigor, we employ a dual-phase verification protocol where a Judge governs early termination based on information sufficiency, while a Jury performs final semantic consensus checks. This design enables adaptive retrieval and chemically valid reasoning while avoiding unnecessary information acquisition. Experiments on four chemistry benchmarks totaling 1,932 questions show that ChemMA consistently outperforms traditional RAG baselines and significantly narrows the gap with proprietary-model systems using only opensource language models. These results highlight the effectiveness of sufficiency-aware agentic control combined with structured architectural constraints for reliable chemical reasoning.

Paper Type: Long

Research Area: LLM agents

Research Area Keywords: multi-agent systems, retrieval-augmented generation, tool use, planning in agents, neurosymbolic reasoning

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 2259

Loading