WebGen-R1: Incentivizing LLMs to Generate Functional and Aesthetic Websites with Reinforcement Learning

WebGen-R1: Incentivizing LLMs to Generate Functional and Aesthetic Websites with Reinforcement Learning

ICLR 2026 Conference Submission25460 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Model, Code Generation, Website Generation, Reinforcement Learning

Abstract: Large Language Models (LLMs) have demonstrated strong capabilities in functional-level code generation, yet their performance remains limited in project-level scenarios such as generating large-scale multi-page websites. Such tasks require coherent multi-file structures, handling of intricate cross-page dependencies, and visually appealing designs. Prior works address only partial aspects of this challenge. For instance, WebDev Arena focuses exclusively on single-page static sites, while agent-based frameworks decompose tasks into subtasks coordinated through multi-turn execution, often relying on proprietary models and suffering from fragile integration, particularly in visual coherence and stylistic consistency. In this work, we introduce WebGen‑R1, pushing toward a more ambitious and practically relevant goal of training a small-scale LLM via reinforcement learning (RL) to generate the entire multi‑page websites in an end‑to‑end manner. A key obstacle lies in reward design. Unlike functional code generation where correctness can be verified by passing automated test suites, web aesthetics covering layout harmony, typographic consistency, and stylistic alignment are inherently subjective, and functional verification often requires dynamic execution across pages where rule-based reward function tend to be brittle. To address these limitations, we design a vision–language–model-based reward model that jointly optimizes functional correctness and aesthetic quality, enabling the model to produce websites that are both visually coherent and faithful to the intended task specification. Extensive experiments across real-world benchmarks demonstrate that WebGen-R1 consistently outperforms, or is comparable to, strong proprietary and open-source baselines in a multi-dimensional evaluation protocol. To facilitate future research in end-to-end multi-page website generation, we release our code and data at https://anonymous.4open.science/r/WebGen-R1.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 25460

Loading