Prompt Learning with Cross-Modal Feature Alignment for Visual Domain AdaptationOpen Website

Published: 01 Jan 2022, Last Modified: 15 May 2023CICAI (1) 2022Readers: Everyone
Abstract: Exploring the capacity of pre-trained large-scale models to learn common features of multimodal data and the effect of knowledge transfer on downstream tasks are two major trends in the multimedia field. However, existing studies usually use pre-trained models as feature extractors, or as the teacher model to achieve knowledge distillation of downstream tasks. Therefore, the cross-modal knowledge transfer mechanism and the knowledge forgetting problem of pre-trained large models have not been fully investigated.To address the above issues, this paper explores the fine-tuning strategy, feature selection strategy and semantic guidance approach in the migration process of pre-trained large models.Aiming at the problem of knowledge forgetting during “fine-tuning”, an image classification algorithm (PMHANet) integrating a pre-trained large-scale model and heterogeneous feature alignment is proposed.More importantly, this provides a cross-modal knowledge transfer paradigm for multimodal pre-training of large models.We conducted experiments on VireoFood-172 and NUS-WIDE and found that large models trained on datasets such as COCO performed better on the similar domain dataset NUS-WIDE than the small domain dataset VireoFood-172; PMHANet effectively implements multimodal representation enhancement in downstream tasks based on a partially fine-tuned pre-trained large model to achieve SOTA performance on both datasets.
0 Replies

Loading