I2I - STRADA – Information to Insights via Structured Reasoning Agent for Data Analysis

I2I - STRADA – Information to Insights via Structured Reasoning Agent for Data Analysis

ACL ARR 2025 July Submission1461 Authors

29 Jul 2025 (modified: 20 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent advances in agentic systems for data analysis have emphasized automation of insight generation through multi-agent frameworks, and orchestration layers. While these systems effectively manage tasks like query translation, data transformation, and visualization, they often overlook the structured reasoning process underlying analytical thinking. Reasoning large language models (LLMs) used for multi-step problem solving are trained as general-purpose problem solvers. As a result, their reasoning or thinking steps do not adhere to fixed processes for specific tasks. Real-world data analysis requires a consistent cognitive workflow: interpreting vague goals, grounding them in contextual knowledge, constructing abstract plans, and adapting execution based on intermediate outcomes. We introduce I2I-STRADA (Information-to-Insight via Structured Reasoning Agent for Data Analysis), an agentic architecture designed to formalize this reasoning process. I2I-STRADA focuses on modeling how analysis unfolds via modular sub-tasks that reflect the cognitive steps of analytical reasoning. Evaluations on the DABstep and DABench benchmarks show that I2I-STRADA outperforms prior systems in planning coherence and insight alignment, highlighting the importance of structured cognitive workflows in agent design for data analysis.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: data analysis, reasoning, large language model, planning

Contribution Types: NLP engineering experiment, Data analysis

Languages Studied: English natural language

Reassignment Request Area Chair: This is not a resubmission

Reassignment Request Reviewers: This is not a resubmission

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: N/A

A2 Elaboration: Risks associated with LLMs require guardrails that are application and organization requirement specific. Our work covers only the algorithmic view.

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: Datasets used for benchmarking are mentioned under Sections 4.1 and 4.2. We are not distributing the code we created for our agent.

B2 Discuss The License For Artifacts: N/A

B2 Elaboration: We are not distributing any artifacts as part of this work. We have utlized open source datasets for benchmarking.

B3 Artifact Use Consistent With Intended Use: N/A

B3 Elaboration: We are not distributing any artifacts as part of this work. We have utlized open source datasets for benchmarking as per their intended use.

B4 Data Contains Personally Identifying Info Or Offensive Content: N/A

B4 Elaboration: We have utlized open source datasets used for benchmarking in various other works.

B5 Documentation Of Artifacts: N/A

B6 Statistics For Data: Yes

B6 Elaboration: Section 4

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: Section 4.1 - Anthropic's Claude Sonnet 3.5 was used to implement our agentic framework. The model is consumed via AWS Bedrock APIs which doesn't require dedicated server/GPU provisioning.

C2 Experimental Setup And Hyperparameters: N/A

C2 Elaboration: N/A since we use default hyperparameters with the Bedrock APIs.

C3 Descriptive Statistics: Yes

C3 Elaboration: Section 4.1 and 4.2

C4 Parameters For Packages: Yes

C4 Elaboration: In Section 4.1 we mention the model used.

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: No

E1 Information About Use Of Ai Assistants: N/A

Author Submission Checklist: yes

Submission Number: 1461

Loading