MatDeplot: Agent-Ready Materials-Curve Understanding for Scientific Reasoning

Published: 30 May 2026, Last Modified: 30 May 2026ICML2026-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Additional Submission Instructions: For the camera-ready version, please include the author names and affiliations, funding disclosures, and acknowledgements.
Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: Scientific Chart Understanding, Vision-Language Models, Agentic Scientific Reasoning
TL;DR: MatDeplot converts materials-science line plots into (x,y) curves, enabling LLMs to answer 14,740 numerical questions at 4.2% median relative error—13× better than the strongest VLM reading chart images directly.
Abstract: Scientific line plots are ubiquitous in materials-science papers, but current vision–language models cannot use them as reliable quantitative evidence. From 55,763 articles, we extract 1,375,165 subfigures and identify 657,428 line-plot panels, showing that such plots are a major carrier of experimental knowledge. We find that hosted VLMs can recognize chart structure and curve morphology, but fail at pixel-level grounding: despite high instance agreement, only up to 1.2% of predicted curves achieve IoU ≥ 0.5 against ground-truth rasterizations. We introduce MatDeplot, a local pipeline for extracting axis-calibrated (x, y) curves from scientific line plots. On MatCurvs-204, MatDeplot achieves 45.8% curve-level IoU success, a 38× improvement over the strongest hosted VLM, while running in 1.5 s at $0.012 per image. More importantly, this pixel-faithful extraction substantially improves scientific reasoning: on MatCurvs-Reasoning, LLMs using MatDeplot-extracted curves reach 4.2% median relative error, compared with 54.3% for VLMs reading chart images directly; on a manually verified subset, GPT-5.4 improves from 15.4% to 4.6%, approaching a deterministic scipy oracle. These results show that for scientific agents, plots should be reconstructed into faithful structured data before reasoning. Code, benchmarks, and the unified evaluator are available from the authors upon reasonable request.
Submission Number: 265
Loading