In-the-wild Pretrained Models Are Good Feature Extractors for Video Quality AssessmentDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: video quality assessment, pretrained models, metric learning
Abstract: Video quality assessment (VQA) is a challenging problem since the perceptual quality of a video can be affected by many factors, \eg content attractiveness, distortion type and level, motion pattern, and level. Further, the huge expense of annotating limits the scale of VQA datasets, which becomes the main obstacle for deep learning-based VQA methods. In this paper, we propose a VQA method leveraging PreTrained Models, named PTM-VQA, to transfer knowledge from models pretrained on various pre-tasks to benefit VQA from different aspects. Specifically, features of input videos are extracted by different pretrained models with frozen weights, transformed to the same dimension, and integrated to generate the final representation. Since these models possess various fields of knowledge and are often trained with labels irrelevant to quality, we propose an Intra-Consistency and Inter-Divisibility (ICID) loss, which imposes constraints on features extracted by multiple pretrained models from different samples. The intra-consistency constrain is model-wise and requires features extracted by different pretrained models to be in the same unified quality-aware latent space, while the sample-wise inter-divisibility introduces pseudo clusters based on the annotation of samples and tries to separate features of samples from different clusters. Further, confronted with a constantly growing number of pretrained models, it is crucial to determine which ones to use and how to use them. To tackle the problem, we propose an efficient scheme to choose suitable candidates: models that possess better clustering performance on a VQA dataset are chosen to be our candidate backbones. Extensive experiments demonstrate the effectiveness of the proposed method.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
TL;DR: In-the-wild pretrained models can be used as feature extractors to represent the perceptual quality of videos directly.
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
7 Replies

Loading