Generating Prompts in Latent Space for Rehearsal-free Continual Learning

Published: 20 Jul 2024, Last Modified: 06 Aug 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Continual learning emerges as a framework that trains the model on a sequence of tasks without forgetting previously learned knowledge, which has been applied in multiple multimodal scenarios. Recently, prompt-based continual learning has achieved excellent domain adaptability and knowledge transfer through prompt generation. However, existing methods mainly focus on designing the architecture of a generator, neglecting the importance of providing effective guidance for training the generator. To address this issue, we propose Generating Prompts in Latent Space (GPLS), which considers prompts as latent variables to account for the uncertainty of prompt generation and aligns with the fact that prompts are inserted into the hidden layer outputs and exert an implicit influence on classification. GPLS adopts a trainable encoder to encode task and feature information into prompts with reparameterization technique, and provides refined and targeted guidance for the training process through the evidence lower bound (ELBO) related to Mahalanobis distance. Extensive experiments demonstrate that GPLS achieves state-of-the-art performance on various benchmarks. Our code is available at https://github.com/Hifipsysta/GPLS.
Primary Subject Area: [Content] Vision and Language
Secondary Subject Area: [Generation] Multimedia Foundation Models
Relevance To Conference: Continual learning emerges as a framework that train models from a series of tasks without forgetting previously learned knowledge, which has been applied in multiple multimodal scenarios. Over the past year, continual learning has been applied in cross-modal retrieval, visual question answering and visual-language model. Continual learning can enhance the ability of multimodal models to acquire a wider range of tasks, thus contributing to the advancement of universal artificial intelligence. Moreover, prompt learning initially establishes a bridge between vision and language, and pre-trained models achieve better generalization ability through training prompts rather than training backbone. Therefore, our research on prompt-based continual learning makes a contribution to multimedia.
Supplementary Material: zip
Submission Number: 2155
Loading