Keywords: SVG, Multimodal, vector graphics, code generation, VLM, LLM
TL;DR: We introduce VectorGym, a benchmark suite for SVG generation and editing: Sketch2SVG, SVG Editing, and Text2SVG. It includes diverse, human-annotated test sets and model evaluations, offering new challenges and insights for vector graphics tasks.
Abstract: We introduce VectorGym, a multi-task benchmark for evaluating Vision-Language Models (VLMs) on Scalable Vector Graphics (SVG) code generation and manipulation. VectorGym addresses the critical lack of challenging benchmarks aligned with real-world design workflows, specifically requiring mastery of complex primitives and multi-step edits. Our benchmark comprises four complementary tasks: the novel Sketch2SVG (VG-Sketch) conversion; a new SVG editing dataset (VG-Edit) involving higher-order primitives and semantic reasoning; and rigorous benchmarks for Text2SVG (VG-Text) and SVG captioning (VG-Cap). VectorGym derives particular value from expert human-authored SVG annotations across all tasks, ensuring a rigorous challenge. VectorGym also introduces a VLM-as-judge metric tailored for SVG generation, validated against human judgment. Our comprehensive evaluation of leading VLMs and our own GRPO-trained models reveals significant performance gaps, establishing VectorGym as a robust framework for advancing visual code generation.
Primary Area: datasets and benchmarks
Submission Number: 21347
Loading