Abstract: LLM-based agent systems have achieved remarkable progress in automatically solving natural language processing tasks, yet they are typically constrained to simpler sequence-to-sequence generation scenarios. Real-world task environments, however, often involve multi-document workspaces requiring agents to explore and achieve specific goals through context-aware information processing. To enhance LLMs' effectiveness in handling end-to-end complex data science tasks, we propose DS-Agent -- a novel LLM-based agent framework inspired by human problem-solving cognition. Our architecture enables workspace exploration through external tools while generating code/SQL to fulfill task objectives. Equipped with customized information retrieval tools, the DS-Agent effectively acquires and filters multi-source information from workspaces, significantly improving the quality of contextual information. Furthermore, its multi-agent architecture implements context partitioning and isolation mechanisms that support dynamic pruning during task planning, preventing individual agents from entering ineffective recursive iterations. We showcase the effectiveness of DS-Agent in agent-based data science tasks, where it achieves state-of-the-art accuracy across multiple models. The DS-Agent powered by GPT-4o reaches an accuracy of 42. 26\%, representing a 10. 01\% improvement over the baseline methods.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: LLM agents, prompting, Generation
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data analysis
Languages Studied: N/A
Submission Number: 7393
Loading