Abstract: One effective approach to improving the performance of cross-modal retrieval is to fine-tune a pre-trained cross-modal model. However, conventional fine-tuning approaches usually require plenty of computational resources. To alleviate such a requirement, we propose a parameter-efficient tuning method of a pre-trained model via prompt learning for cross-modal retrieval. Obtaining inspiration from the prompt learning technique in natural language processing, our method constructs a multidimensional vector as a prompt for the cross-modal retrieval, and the prompt with a few parameters is optimized to achieve better retrieval performance. We conducted experiments on the open dataset, and the results verify that our proposed method is effective and parameter-efficient.
0 Replies
Loading