CP-Prompt: Composition-Based Cross-modal Prompting for Domain-Incremental Continual Learning

Yu Feng; Zhen Tian; Yifan Zhu; Zongfu Han; Haoran Luo; Guangwei Zhang; Meina Song

CP-Prompt: Composition-Based Cross-modal Prompting for Domain-Incremental Continual Learning

Yu Feng, Zhen Tian, Yifan Zhu, Zongfu Han, Haoran Luo, Guangwei Zhang, Meina Song

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The key challenge of cross-modal domain-incremental learning (DIL) is to enable the learning model to continuously learn from novel data with different feature distributions under the same task without forgetting old ones. However, existing top-performing methods still cause high forgetting rates, by lacking intra-domain knowledge extraction and inter-domain common prompting strategy. In this paper, we propose a simple yet effective framework, CP-Prompt, by training limited parameters to instruct a pre-trained model to learn new domains and avoid forgetting existing feature distributions. CP-Prompt captures intra-domain knowledge by compositionally inserting personalized prompts on multi-head self-attention layers and then learns the inter-domain knowledge with a common prompting strategy. CP-Prompt shows superiority compared with state-of-the-art baselines among three widely evaluated DIL tasks. The source code is available at https://anonymous.4open.science/r/CP_Prompt-C126.

Primary Subject Area: [Experience] Multimedia Applications

Relevance To Conference: Multimodal models have garnered significant attention due to their capability to process and integrate diverse types of data. However, these models often encounter the challenge of fluctuating data feature distributions in practical applications. Such variations can lead to a degradation in model performance, especially when the model relies on the accumulation and updating of data over an extended period. The phenomenon of forgetting is particularly pronounced, adversely affecting the model's ability to adapt to new data and reducing its recognition accuracy for previously encountered data. Consequently, addressing the minimization of forgetting rates in multimodal models to enhance their long-term learning and generalization capabilities has become a pressing issue for researchers. We present a simple yet effective prompt tuning framework CP-Prompt for cross-modal domain-incremental learning, with a parameter-efficient twin-prompting design that preserved both inter-domain common knowledge and intra-domain personalized knowledge.

Supplementary Material: zip

Submission Number: 4213

Loading