GraphicWeaver: Benchmarking Agentic Planning for Graphic Design Generation

Published: 05 May 2026, Last Modified: 05 May 20264th ALVR PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: planning benchmark, agentic planning, vision-language agent, graphic design generation
TL;DR: We introduce GraphicWeaver, a planning benchmark grounded in real-world graphic design needs, to assess the complex design planning and tool-use capabilities of vision-language agents.
Abstract: Vision-language model (VLM)-powered agents are increasingly enabling new forms of automation across various human tasks. While prior work has primarily focused on well-defined problems with explicit goals, the capabilities of agents in creative graphic design, where goals are inherently open-ended and subjective, remain largely underexplored. To bridge this gap, we introduce GraphicWeaver, a planning benchmark for graphic design comprising 1,079 diverse user queries and associated images spanning four design categories. Comprehensive experiments with six models reveal that current VLM-based agents struggle to handle such complex planning tasks, which require taking into account both explicit design constraints specified in queries and implicit commonsense design principles. We attribute these failures to challenges in (1) retrieving appropriate parameters for tool usage, (2) understanding spatial relationships across design components, and (3) coordinating dependencies across agents. We envision GraphicWeaver as a challenging yet valuable testbed for advancing VLM agent planning in creative design contexts.
Submission Number: 8
Loading