MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality

MuAP: Multi-step Adaptive Prompt Learning for Vision-Language Model with Missing Modality

ACL ARR 2024 June Submission2593 Authors

15 Jun 2024 (modified: 03 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recently, prompt learning has garnered considerable attention for its success in various Vision-Language (VL) tasks. However, existing prompt-based models are primarily focused on studying prompt generation and prompt strategies with complete modality settings, which does not accurately reflect real-world scenarios where partial modality information may be missing. In this paper, we present the first comprehensive investigation into prompt learning behavior when modalities are incomplete, revealing the high sensitivity of prompt-based models to missing modalities. To this end, we propose a novel $\underline{\textbf{Mu}}$lti-step $\underline{\textbf{A}}$daptive $\underline{\textbf{P}}$rompt Learning $(\textbf{MuAP}) $framework, aiming to generate multimodal prompts and perform multi-step prompt tuning, which adaptively learns knowledge by iteratively aligning modalities. Specifically, we generate multimodal prompts for each modality and devise prompt strategies to integrate them into the Transformer model. Subsequently, we sequentially perform prompt tuning from single-stage and alignment-stage, allowing each modality-prompt to be autonomously and adaptively learned, thereby mitigating the imbalance issue caused by only textual prompts that are learnable in previous works. Extensive experiments demonstrate the effectiveness of our MuAP and this model achieves significant improvements compared to the state-of-the-art on all benchmark datasets. Our codes are available at https://anonymous.4open.science/r/multiview_adaptative_prompt_learning/

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: vision language navigation, multimodality

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings

Languages Studied: english

Submission Number: 2593

Loading