Keywords: prompt learning, adapter learning, unbalanced optimal transport, large vision-language model
Verify Author List: I have double-checked the author list and understand that additions and removals will not be allowed after the submission deadline.
TL;DR: prompt learning for large-vision language model
Abstract: Prompt learning methods are gaining increasing attention due to their ability to customize large vision-language models to new domains using pre-trained contextual knowledge and minimal training data. However, existing works typically rely on optimizing unified prompt
inputs, often struggling with fine-grained classification tasks due to insufficient discriminative attributes. To tackle this, we consider a new framework based on a dual context of both domain-shared and class-specific contexts, where the latter is generated by Large
Language Models (LLMs) such as GPTs. Such dual prompt methods enhance the model’s feature representation by joining implicit and explicit factors encoded in LLM knowledge. Moreover, we formulate the Unbalanced Optimal Transport (UOT) theory to quantify the
relationships between constructed prompts and visual tokens. Through partial matching, UOT can properly align discrete sets of visual tokens and prompt embeddings under different mass distributions, which is particularly valuable for handling irrelevant or noisy
elements, ensuring that the preservation of mass does not restrict transport solutions. Furthermore, UOT’s characteristics integrate seamlessly with image augmentation, expanding the training sample pool while maintaining a reasonable distance between perturbed images and prompt inputs. Extensive experiments across few-shot classification and adapter settings substantiate the superiority of our model over current state-of-the-art baselines.
A Signed Permission To Publish Form In Pdf: pdf
Primary Area: General Machine Learning (active learning, bayesian machine learning, clustering, imitation learning, learning to rank, meta-learning, multi-objective learning, multiple instance learning, multi-task learning, neuro-symbolic methods, etc.)
Paper Checklist Guidelines: I certify that all co-authors of this work have read and commit to adhering to the guidelines in Call for Papers.
Student Author: Yes
Submission Number: 199
Loading