ChartAgent: A Modular Agentic Framework for Accurate Chart-to-Table Extraction with Visual Zooming

ChartAgent: A Modular Agentic Framework for Accurate Chart-to-Table Extraction with Visual Zooming

ACL ARR 2025 July Submission1370 Authors

29 Jul 2025 (modified: 17 Aug 2025)ACL ARR 2025 July SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: ChartAgent is a plug‑and‑play, agent‑based framework with a two‑stage pipeline: first a chart‑to‑table pretrained VLM generates an initial table from a chart image, then a ReAct LLM‑based agent iteratively corrects it, optionally using a novel zooming tool for fine‑grained inspection. Evaluated on ChartQA, ChartAgent consistently outperforms VLM‑only and single‑pass correction baselines in header alignment, numerical fidelity, and overall table quality, all without any additional fine‑tuning. Abstract: Extracting structured tables from chart images is a challenging task that underpins numerous downstream document analysis applications. While previous studies have demonstrated that multimodal large language models (MLLMs) and vision-language models (VLMs) can convert charts into tables, these models frequently fail to adhere to strict formatting standards, omit fine-grained labels, or introduce numerical inaccuracies. In this work, we introduce ChartAgent, a plug-and-play, agent-based framework that augments any off-the-shelf VLM through a two-stage agentic pipeline. In the first stage, a chart-to-table pretrained VLM generates an initial table directly from the chart image. In the second stage, a ReAct LLM-based agent iteratively corrects the generated table by cross-verifying visual regions and textual entries. This agent can optionally utilize a novel zooming tool designed for detailed and precise inspection of complex, densely packed chart areas. To evaluate the effectiveness of ChartAgent, we benchmarked its performance on the ChartQA dataset against state-of-the-art methods. Our experiments demonstrate consistent improvements over both VLM-only and single-pass correction baselines across structural and numerical metrics. The modular design of ChartAgent enables seamless integration with any VLM without requiring additional fine-tuning. This approach significantly enhances header alignment, numerical fidelity, and overall table quality, providing a robust and efficient solution for accurate chart-to-table extraction.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: vision language navigation; cross-modal pretraining; image text matching; cross-modal content generation; vision question answering; cross-modal application; cross-modal information extraction; multimodality

Languages Studied: English

Previous URL: https://openreview.net/forum?id=SyHGjderZ8

Explanation Of Revisions PDF: pdf

Reassignment Request Area Chair: Yes, I want a different area chair for our submission

Reassignment Request Reviewers: Yes, I want a different set of reviewers

Software: zip

A1 Limitations Section: This paper has a limitations section.

A2 Potential Risks: No

B Use Or Create Scientific Artifacts: Yes

B1 Cite Creators Of Artifacts: Yes

B1 Elaboration: 4

B2 Discuss The License For Artifacts: N/A

B3 Artifact Use Consistent With Intended Use: N/A

B4 Data Contains Personally Identifying Info Or Offensive Content: N/A

B5 Documentation Of Artifacts: N/A

B6 Statistics For Data: N/A

C Computational Experiments: Yes

C1 Model Size And Budget: Yes

C1 Elaboration: 4

C2 Experimental Setup And Hyperparameters: Yes

C2 Elaboration: 4

C3 Descriptive Statistics: Yes

C3 Elaboration: 4

C4 Parameters For Packages: N/A

D Human Subjects Including Annotators: No

D1 Instructions Given To Participants: N/A

D2 Recruitment And Payment: N/A

D3 Data Consent: N/A

D4 Ethics Review Board Approval: N/A

D5 Characteristics Of Annotators: N/A

E Ai Assistants In Research Or Writing: Yes

E1 Information About Use Of Ai Assistants: No

E1 Elaboration: it was used only for grammar and vocabulary correction.

Author Submission Checklist: yes

Submission Number: 1370

Loading