Benchmarking and Enhancing Text-to-Image Models for Generating Visual Representations in Early Arithmetic Education

Benchmarking and Enhancing Text-to-Image Models for Generating Visual Representations in Early Arithmetic Education

ACL ARR 2026 January Submission2071 Authors

01 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Equation-to-Visual Generation, Text-to-Image Evaluation, Multimodal Benchmarks, Visual Representations for Learning, Early Arithmetic Education

Abstract: Visual representations are highly effective in early arithmetic education, as they make abstract mathematical symbols more concrete and support the development of numeracy and reasoning skills. However, creating such visuals is labor-intensive for teachers. In this work, we introduce the equation-to-visual generation task and E2V-Bench, a benchmark for generating pedagogically meaningful visuals from arithmetic equations. Developed with insights from primary school math teachers and informed by visual patterns extracted from six educational resources, E2V-Bench comprises 1.5K arithmetic problems spanning four visual types. We also propose new automatic metrics for evaluating generated visuals. A systematic evaluation on E2V-Bench reveals that open-source text-to-image models perform substantially worse than the strongest closed-source models. Building on these findings, we curate a high-quality training dataset and demonstrate that our model adaptation strategies, including rejection sampling fine-tuning, prompt refinement, and regeneration, significantly improve model performance. This work establishes a foundation for studying equation-to-visual generation and facilitates automated tools that support teachers in creating visuals for arithmetic education.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: cross-modal content generation,cross-modal application,multimodality

Contribution Types: NLP engineering experiment, Data resources, Data analysis

Languages Studied: English

Submission Number: 2071

Loading