Breaking the Chart Barrier: A Comprehensive Analysis Reveals Why AI Excels at Code but Fails at Visual Scientific Diagrams

16 Sept 2025 (modified: 08 Oct 2025)Submitted to Agents4ScienceEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Scientific Diagram Generation
Abstract: Automatic scientific diagram generation represents a critical bottleneck in modern research communication, where scientists spend 15-20\% of their time creating visualizations. We present the first comprehensive benchmark evaluating six state-of-the-art methods across 2,177+ synthetic scientific diagrams spanning Physics, Biology, Economics, and Computer Science domains. Our evaluation framework introduces seven complementary metrics assessing visual similarity, code correctness, semantic accuracy, and execution success. Results reveal a clear performance hierarchy: ChartCoder (vision-language fusion, 0.89±0.05) significantly outperforms METAL (meta-learning, 0.85±0.06) and MatPlotAgent (multi-agent, 0.72±0.08), with all pairwise comparisons statistically significant (p$<$0.001). However, we identify a fundamental \emph{chart barrier}: despite reasonable code correctness (0.52±0.20), all methods fail dramatically at visual similarity (0.127±0.089)—a 75\% performance gap that prevents practical deployment. Critical findings include: (1) universal 44.9\% performance degradation from simple to complex visualizations, (2) code generation as the most critical component (41.3\% importance), and (3) visual similarity errors dominating failure modes (35\% of all errors). This work establishes rigorous evaluation standards, identifies the primary bottleneck in automatic diagram generation, and provides open-source infrastructure for accelerating progress toward practical scientific visualization assistance.
Submission Number: 297
Loading