ChemCoScientist: LLM-Based Multi-Agent Assistant for Automated Solving of Chemical Tasks Using Data-Driven Tools

Gleb V. Solovev, Ivan Gurev, Anastasia Vepreva, Ivan Dubrovsky, Alina B. Zhidkovskaya, Kamil Fatkhiev, Elizaveta Lutsenko, Anastasia Orlova, Nina Gubina, Nikolay O. Nikitin, Andrei Dmitrenko, Anna V. Kalyuzhnaya

Published: 2025, Last Modified: 05 May 2026ICDM (Workshops) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper introduces a multi-agent system designed to automate data-intensive machine learning workflows. Using drug discovery as a case study, we deploy specialized agents to execute a pipeline comprising: (1) targeted information retrieval from scientific literature, (2) automated data preparation informed by the extracted knowledge, and (3) the training of both predictive and generative models. Our results demonstrate that the system successfully automates complex data curation and model training, achieving performance comparable to that of manually engineered pipelines. This work underscores the critical role of automated data mining in enabling robust, end-to-end problem-solving in data-scarce domains. The ChemCoScientist is available in https://github.com/ITMO-NSS-team/CoScientist.
Loading