Figma2Code: Automating Multimodal Design to Code in the Wild

ICLR 2026 Conference Submission920 Authors

02 Sept 2025 (modified: 23 Dec 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Code Generation, Desigin to Code
Abstract: Front-end development constitutes a substantial portion of software engineering, yet converting design mockups into production-ready *User Interface* (UI) code remains tedious and time-costly. While recent work has explored automating this process with *Multimodal Large Language Models* (MLLMs), existing approaches typically rely solely on design images. As a result, they must infer complex UI details from images alone, often leading to degraded results. In real-world development workflows, however, design mockups are usually delivered as Figma files—a widely used tool for front-end design—that embed rich multimodal information (e.g., metadata and assets) essential for generating high-quality UI. To bridge this gap, we introduce Figma2Code, a new task that generalizes *design-to-code* into a multimodal setting and aims to automate *design-to-code* in the wild. Specifically, we collect paired design images and their corresponding metadata files from the Figma community. We then apply a series of processing operations, including rule-based filtering, human and MLLM-based annotation and screening, and metadata refinement. This process yields 3,055 samples, from which designers curate a balanced dataset of 213 high-quality cases. Using this dataset, we benchmark ten state-of-the-art open-source and proprietary MLLMs. Our results show that while proprietary models achieve superior visual fidelity, they remain limited in layout responsiveness and code maintainability. Further experiments across modalities and ablation studies corroborate this limitation, partly due to models’ tendency to directly map primitive visual attributes from Figma metadata.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 920
Loading