Abstract: The recent rapid development of video generation technology has led to a significant demand for quality assessment of the latest AI-generated videos. However, current supervised approaches depend on expensive and quickly outdated human scores, and label-free methods overlook the general distortions of AI-generated videos. To address these limitations, we introduce LMVQ, a Label-free Metric-learning framework for general AI-generated Video Quality assessment of three dimensions, spatial, temporal, and alignment. The LMVQ is the first to introduce sample degradations specially designed for AIGC-specific distortions, and constructs a comprehensive training set through two complementary sample generation strategies. It then employs two synergistic modules, the Intra-Quality Token Transformer (IQ-Trans), which explicitly refines dimension-specific quality representations, and the Inter-Quality Mixture of Experts (IQ-MoE), which fuses interactions across multiple quality dimensions. Finally, a Multi-Proxy Metric-Learning (MPML) strategy aligns the learned representations with multi-dimensional quality scores and constrains the model to learn discriminative quality-aware representations. Extensive experiments on four public AIGC-VQA benchmarks show that MPML outperforms previous label-free methods by over 20%, and greatly narrows the gap with supervised methods. This provides a scalable, adaptive foundation for evaluating the ever-evolving quality of AI-generated videos.
External IDs:doi:10.1109/tcsvt.2025.3618655
Loading