Abstract: Multimodal large language models have recently shown strong potential for translating WebUI screenshots into code, yet existing studies mainly focus on single-target settings such as HTML and often overlook a more practical challenge: generating executable code across multiple front end frameworks. Compared with vanilla HTML, framework-based generation requires the model to satisfy not only visual fidelity, but also framework-specific syntax, component organization, dependency consistency, and compilation constraints. As a result, directly transferring existing WebUI-to-code approaches to React, Vue, and Angular often leads to severe drops in executability.
To address this problem, we study executable WebUI generation across front end frameworks with MLLMs and propose a progressive learning framework that explicitly accounts for cross-framework heterogeneity during adaptation. Our approach combines a shared multimodal backbone with framework-aware adaptation modules and adopts a phase-wise training strategy that organizes frameworks according to their compatibility, enabling the model to first learn transferable interface knowledge and then gradually absorb framework-specific implementation patterns. This design improves knowledge sharing across frameworks while reducing destructive interference caused by large differences in syntax and project structure.
Experiments on multi-framework WebUI code generation benchmarks show that our method consistently improves compilation success and overall executability across diverse frameworks, while maintaining competitive visual fidelity. Further analysis demonstrates that phase-wise adaptation yields more stable cross-framework transfer and better balances shared abstraction learning with framework-specific specialization.
Loading