Keywords: SVG, Multimodal, vector graphics, code generation, VLM, LLM
TL;DR: We introduce VectorGym, a benchmark suite for SVG generation and editing: Sketch2SVG, SVG Editing, and Text2SVG. It includes diverse, human-annotated test sets and model evaluations, offering new challenges and insights for vector graphics tasks.
Abstract: We introduce VectorGym, a new comprehensive multi-task benchmark for evaluating Vision-Language Models (VLMs) on Scalable Vector Graphics (SVG) code generation and manipulation. VectorGym addresses the critical need for systematic evaluation across diverse SVG-related capabilities in the emerging field of visual code generation. Our benchmark comprises four complementary tasks: Sketch2SVG conversion, SVG editing with natural language instructions, Text2SVG generation, and SVG captioning. It introduces Sketch2SVG and the first dataset of complex, human-authored SVG edits, with gold-standard human annotations across all tasks. We propose a novel automatic VLM-as-judge evaluation metric specifically tailored for SVG generation tasks, validated through human correlation studies across multiple state-of-the-art models. We provide a comprehensive evaluation of leading closed-source and open-source VLMs, which reveals significant performance variations across tasks, highlighting both current capabilities and critical limitations. VectorGym establishes a new standard for evaluating and advancing SVG generation capabilities, offering the research community a robust framework for measuring progress in this emerging field.
Primary Area: datasets and benchmarks
Submission Number: 21347
Loading