ADCMT: An Augmentation-Free Dynamic Contrastive Multi-Task Transformer for UGC-VQA

Hui Li, Kaibing Zhang, Jie Li, Xinbo Gao, Guang Shi

Published: 2025, Last Modified: 08 Mar 2026IEEE Trans. Broadcast. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Quantifying the quality of user-generated content (UGC) videos is particularly challenging due to the presence of complex multi-source distortions and the limited availability of annotated samples. Many current approaches to UGC video quality assessment (UGC-VQA) commonly surmount these dilemmas by applying distortion augmentation and contrastive learning strategies to enhance performance. However, the distribution of augmented samples deviates from raw UGC videos, resulting in limited improvement. In this paper, we proposed a novel Augmentation-free Dynamic Contrastive Multi-task Transformer (ADCMT) for UGC-VQA. Specifically, the primary task of quality score regression and another auxiliary task of feature recalibration are jointly addressed using a supervised contrastive learning multi-task transformer. The quality label space is partitioned into several subspaces to coarsely and dynamically guide the feature reconstruction in each mini-batch, enhancing its quality-awareness capabilities. This approach ensures that the distribution of embedded perceptual features aligns more closely with quality perception, effectively yielding fine-grained quality score regression. Thorough experiments carried out upon six publicly available UGC-VQA databases: KoNViD-1k, CVD2014, LIVE-Qualcomm, LIVE-VQC, YouTube-UGC, and LSVQ-Subset demonstrate that the proposed ADCMT shows significant performance improvement over other state-of-the-art competitors. The source code will be available at https://github.com/kbzhang0505/ADCMT.
Loading