VectorGym: A Multi-Task Benchmark for SVG Code Generation and Manipulation

Juan A. Rodriguez; Haotian Zhang; Abhay Puri; Aly Shariff; Meng lin; Xiaoqing Xie; Tianyang Zhang; Rishav Pramanik; Sai Rajeswar; Perouz Taslakian; Spandana Gella; David Vazquez; Christopher Pal; Marco Pedersoli

VectorGym: A Multi-Task Benchmark for SVG Code Generation and Manipulation

Juan A. Rodriguez, Haotian Zhang, Abhay Puri, Aly Shariff, Meng lin, Xiaoqing Xie, Tianyang Zhang, Rishav Pramanik, Sai Rajeswar, Perouz Taslakian, Spandana Gella, David Vazquez, Christopher Pal, Marco Pedersoli

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: SVG, Multimodal, vector graphics, code generation, VLM, LLM

TL;DR: We introduce VectorGym, a benchmark suite for SVG generation and editing: Sketch2SVG, SVG Editing, and Text2SVG. It includes diverse, human-annotated test sets and model evaluations, offering new challenges and insights for vector graphics tasks.

Abstract: We introduce VectorGym, a multi-task benchmark for evaluating Vision-Language Models (VLMs) on Scalable Vector Graphics (SVG) code generation and manipulation. VectorGym addresses the critical lack of challenging benchmarks aligned with real-world design workflows, specifically requiring mastery of complex primitives and multi-step edits. Our benchmark comprises four complementary tasks: the novel Sketch2SVG (VG-Sketch) conversion; a new SVG editing dataset (VG-Edit) involving higher-order primitives and semantic reasoning; and rigorous benchmarks for Text2SVG (VG-Text) and SVG captioning (VG-Cap). VectorGym derives particular value from expert human-authored SVG annotations across all tasks, ensuring a rigorous challenge. VectorGym also introduces a VLM-as-judge metric tailored for SVG generation, validated against human judgment. Our comprehensive evaluation of leading VLMs and our own GRPO-trained models reveals significant performance gaps, establishing VectorGym as a robust framework for advancing visual code generation.

Primary Area: datasets and benchmarks

Submission Number: 21347

Loading