Reliable Data Science Analysis with Large Language Models via Multi-Agent Tools Orchestration
Keywords: LLM, Code generation, Multi-agent, Data science
TL;DR: We reframe LLM code generation from an open-ended 'essay question' into a structured tool selection problem, where the model makes decisions by analyzing full tool source code, fundamentally solving code unreliability.
Abstract: While Large Language Models (LLMs) show promise for automating the labor-intensive process of data science analysis, their practical application is undermined by the generation of erroneous and unreliable code. We argue this stems from treating LLMs as open-ended code generators—a task akin to answering an essay question. We propose a fundamental paradigm shift: our framework reframes the task to one of structured tool selection and parameterization, effectively turning the 'essay question' into a sequence of 'multiple-choice and fill-in-the-blanks' problems. This dramatically reduces the potential for error. Our contributions are twofold. Firstly, a multi-agent framework orchestrates the workflow, breaking down complex tasks into verifiable steps. Secondly, we construct a comprehensive tool library—dynamically curated by auto-generating and solving problems to yield 223 functions across 16 categories—which serves as the foundation for a novel code-aware invocation mechanism that empowers the LLM to select tools by analyzing their full source code, not just brief descriptions. On the InfiAgent-DABench benchmark, our method achieves 98.43\% accuracy, outperforming all baseline algorithms.. Furthermore, our framework with a weaker model achieves 93.7% accuracy, a stark contrast to the 50% from baseline methods, demonstrating that success hinges on our structured approach, not model strength.
Area: Engineering and Analysis of Multiagent Systems (EMAS)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 486
Loading