WebCode2M: A Real-World Dataset for Code Generation from Webpage Designs

Yi Gui; Zhen Li; Yao Wan; Yemin Shi; Hongyu Zhang; Yi Su; Bohua Chen; Dongping Chen; Siyuan Wu; Xing Zhou; Wenbin Jiang; Hai Jin; Xiangliang Zhang

WebCode2M: A Real-World Dataset for Code Generation from Webpage Designs

Yi Gui, Zhen Li, Yao Wan, Yemin Shi, Hongyu Zhang, Yi Su, Bohua Chen, Dongping Chen, Siyuan Wu, Xing Zhou, Wenbin Jiang, Hai Jin, Xiangliang Zhang

Published: 29 Jan 2025, Last Modified: 29 Jan 2025WWW 2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Track: Web mining and content analysis

Keywords: Benchmark Dataset, Webpage Generation, Code Generation, TreeBLEU

TL;DR: We have proposed a real-world dataset named Vision2UI, a novel metric named TreeBLEU, and fine-tuned a benchmarking model for automated generation of UI code from design images.

Abstract: Automatically generating webpage code from webpage designs can significantly reduce the workload of front-end developers, and recent Multimodal Large Language Models (MLLMs) have shown promising potential in this area. However, our investigation re- veals that most existing MLLMs are constrained by the absence of high-quality, large-scale, real-word datasets, resulting in inadequate performance in automated webpage code generation. To fill this gap, this paper introduces WebCode2M, a new dataset comprising 2.56 million instances, each containing a design image along with the corresponding webpage code and layout details. Sourced from real- world web resources, WebCode2M offers a rich and valuable dataset for webpage code generation across a variety of user scenarios. The dataset quality is ensured by a highly accurate scoring model that filters out instances with aesthetic deficiencies or other incomplete elements. To validate the effectiveness of our proposed dataset, we introduce a baseline model based on the Vision Transformer (ViT), named WebCoder, and establish a benchmark for fair comparison. Additionally, we introduce a new metric, TreeBLEU, to measure the structural hierarchy recall. The benchmarking results demonstrate that our dataset significantly improves the ability of MLLMs to gen- erate code from webpage designs, confirming its effectiveness and usability for future applications in front-end design tools. Finally, we highlight several practical challenges introduced by our dataset, calling for further research. We have hosted the WebCode2M on an anonymous webpage: https://webcode2m-anonymous.github.io.

Submission Number: 607

Loading