Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow

Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow

ACL ARR 2025 May Submission2186 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Industries such as finance, meteorology, and energy generate vast amounts of heterogeneous data daily. Efficiently managing, processing, and visualizing such data is labor-intensive and frequently necessitates specialized expertise. Leveraging large language models (LLMs) to develop an automated workflow presents a highly promising solution. However, LLMs are not adept at handling complex numerical computations and table manipulations, and they are further constrained by a limited length context. To bridge this, we propose Data-Copilot, a data analysis agent that autonomously performs data querying, processing, and visualization tailored to diverse human requests. The advancements are twofold: First, it is a code-centric agent that leverages code as an intermediary to process and visualize massive data based on human requests, achieving automated large-scale data analysis. Second, Data-Copilot involves a \textbf{data exploration} phase in advance, which autonomously explores how to design universal and error-free interfaces from data, reducing the error rate in real-time responses. Specifically, It imitates common requests from data sources, abstracts them into universal interfaces (code modules), optimizes their functionality, and validates effectiveness. For real-time requests, Data-Copilot invokes these interfaces to address user intent. Compared to generating code from scratch, invoking these pre-designed and well-validated interfaces can significantly reduce errors during real-time requests. We open-sourced Data-Copilot with massive Chinese financial data, such as stocks, funds, and news. Quantitative evaluations indicate that our exploration-deployment strategy addresses human requests more accurate and efficiently, with good interpretability.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: LLM/AI agents

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 2186

Loading