FronTalk: Benchmarking Front-End Development as Conversational Code Generation with Multi-Modal Feedback
Keywords: front-end, code generation, multi-turn code generation, multi-turn conversation, visual coding
Abstract: Multi-turn, multi-modal interaction is a common interaction pattern in human–AI collaborative coding, where users iteratively refine implementations using both language and visual feedback. For example, in front-end development, users often combine textual instructions with visual artifacts such as sketches, mockups, and annotated screenshots. Despite its prevalence, such workflow remains largely overlooked: existing research primarily studied simplified single-turn or text-only settings. To address this gap, we introduce **FronTalk**, a benchmark for **multi-turn, multi-modal code generation** in front-end development. FronTalk has 100 multi-turn dialogues from diverse real-world websites. Each turn includes a textual instruction and an equivalent visual instruction. We further propose a novel *agent-based evaluation framework* that leverages a web agent to simulate user interactions and measure both implementation correctness and user experience of the generated websites. Evaluation of 20 models reveals two fundamental challenges: (1) a *forgetting issue*, where models often overwrite previously implemented features, and (2) a *visual interpretation gap*, where models struggle to translate visual inputs into functional requirements. To address these challenges, we propose AceCoder, which employs a web agent to autonomously critique implementations and verify compliance with prior instructions. This reduces the forgetting rate from 8.0-28.2% to **nearly zero**. In terms of overall performance, AceCoder improves the textual feedback setting by up to **9.3%** (56.0%$\rightarrow$65.3%) and also mitigates the visual interpretation challenge, boosting performance by up to **5.2%** (55.0%$\rightarrow$60.2%) with visual feedback.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 17
Loading