WebCode2M: A Real-World Dataset for Code Generation from Webpage Designs

Published: 29 Jan 2025, Last Modified: 29 Jan 2025WWW 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Track: Web mining and content analysis
Keywords: Benchmark Dataset, Webpage Generation, Code Generation, TreeBLEU
TL;DR: We have proposed a real-world dataset named Vision2UI, a novel metric named TreeBLEU, and fine-tuned a benchmarking model for automated generation of UI code from design images.
Abstract: Automatically generating webpage code from webpage designs can significantly reduce the workload of front-end developers, and recent Multimodal Large Language Models (MLLMs) have shown promising potential in this area. However, our investigation re- veals that most existing MLLMs are constrained by the absence of high-quality, large-scale, real-word datasets, resulting in inadequate performance in automated webpage code generation. To fill this gap, this paper introduces WebCode2M, a new dataset comprising 2.56 million instances, each containing a design image along with the corresponding webpage code and layout details. Sourced from real- world web resources, WebCode2M offers a rich and valuable dataset for webpage code generation across a variety of user scenarios. The dataset quality is ensured by a highly accurate scoring model that filters out instances with aesthetic deficiencies or other incomplete elements. To validate the effectiveness of our proposed dataset, we introduce a baseline model based on the Vision Transformer (ViT), named WebCoder, and establish a benchmark for fair comparison. Additionally, we introduce a new metric, TreeBLEU, to measure the structural hierarchy recall. The benchmarking results demonstrate that our dataset significantly improves the ability of MLLMs to gen- erate code from webpage designs, confirming its effectiveness and usability for future applications in front-end design tools. Finally, we highlight several practical challenges introduced by our dataset, calling for further research. We have hosted the WebCode2M on an anonymous webpage: https://webcode2m-anonymous.github.io.
Submission Number: 607
Loading