COGGEN: BRIDGING VISUAL MIMICRY AND COGNITIVE ALIGNMENT IN FRONTEND CODE GENERATION VIA GAZE-ATTENTION DIFFUSION

Yang Zhang, Gaode Chen, Jingwu Chen

Published: 11 Nov 2025, Last Modified: 25 Jan 2026CoRR 2025EveryoneCC BY 4.0

Abstract: Contemporary frontend code generation paradigms predominantly rely on Multimodal Large Language Models (MLLMs) to map static visual artifacts to Document Object Model (DOM) structures. While effective at visual imitation, these approaches suffer from ”interaction blindness”—generating code that is visually faithful but functionally brittle or cognitively taxing for end-users. In this paper, we propose CogGen, a neuro-symbolic framework that redefines interface synthesis as a trajectory optimization problem within a latent user-intent manifold. Unlike direct pixel-to-code translation, CogGen introduces a Gaze-Attention Diffusion Bridge that hallucinates temporal interaction heatmaps prior to code generation, effectively predicting user focus flow before syntax construction. We further propose a differentiable ”Cognitive Load Loss” function, trained on a massive dataset of simulated eye-tracking and cursor dynamics, which penalizes generated ASTs (Abstract Syntax Trees) that induce high friction or accessibility violations, even if they satisfy the visual prompt. By integrating a lightweight, differentiable rendering engine directly into the gradient loop, CogGen optimizes for interaction ergonomics rather than mere pixel reconstruction error. Experiments across the WebBench-2026 suite demonstrate that CogGen achieves a 42% reduction in predicted user interaction latency and spontaneously corrects ”dark patterns” in UI designs, significantly outperforming state-of-the-art MLLMs in functional robustness while maintaining high visual fidelity. This work establishes a new frontier in human-centric program synthesis, shifting the objective from visual mimicry to cognitive alignment.