VideoForge: Efficient Domain Adaptation for Video Generation Through Quality-Driven Rewards and Enhanced LoRA

Xiaogang Wang, Bill Cai, GIRISH DILIP PATIL, Yash Shah

Published: 30 Mar 2026, Last Modified: 14 May 2026WACVWEveryoneCC BY-NC-ND 4.0

Abstract: Video generation from textual descriptions has garnered significant interest, yet existing methods often struggle with generating videos that are both high-quality and closely aligned with textual prompts. Current models frequently suffer from poor temporal coherence, inconsistent adherence to text, and difficulty adapting to new domains, limiting their practical utility. In this paper, we introduce VideoForge, an efficient approach to domain adaptation for video generation, which addresses two issues through three key innovations. Firstly, we propose advanced reward models designed explicitly to encourage both visual fidelity and consistency during training. Secondly, we extend standard LoRA with a small nonlinear bottleneck on top of the base layer, rather than just a simple linear low-rank update. The adapter effectively selects different low-rank “modes” for different inputs, which can be viewed as a mixture of multiple subspaces. Extensive qualitative and quantitative experiments demonstrate that VideoForge achieves state-of-the-art performance, significantly outperforming existing baselines in terms of video quality, coherence, and alignment with textual input.