Contextual Kernels for Task-Aware Fine-Tuning in Vision-Language Models

ICLR 2025 Conference Submission13744 Authors

28 Sept 2024 (modified: 28 Nov 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Continual Learning, Model Adaptability, Task Incremental Learning, Kernel Based Task Reprresentation Learning
TL;DR: We propose a method leveraging Vision-Language Models (VLMs) and contextual generation to enhance task adaptability while preserving generality, achieving state-of-the-art performance in dynamic Task Incremental Learning (TIL) scenarios.
Abstract: Vision-Language Models (VLMs) exhibit impressive generalization due to training on vast datasets like ImageNet. However, their performance diminishes on unfamiliar tasks. While downstream fine-tuning enhances adaptability, it often sacrifices inherent generality. To address this, we propose a novel method leveraging contextual generation for enhanced task representation within a semantic space. Our approach utilizes VLMs to generate detailed contextual descriptions for test image batches, developing Contextual Kernels (CK) for each class in the semantic space. Our test-time fine-tuning preserves core VLM features by freezing fundamental components and extending a linear network for semantic kernel density projection. This strategy significantly boosts model adaptability for real-world tasks. Despite strong zero-shot capabilities, we explore additional training samples to improve adaptability in dynamic Task Incremental Learning (TIL) scenarios. Each task's unique CK distribution serves as a fingerprint, enabling high-performance TIL with minimal forgetting. Experiments on four TIL datasets demonstrate the efficacy of our framework, achieving state-of-the-art performance. Our findings reveal that the semantic space within the text mode encapsulates both VLMs' generality and adaptability, paving the way for robust applications in diverse, evolving task environments. This work systematically balances generality and adaptability in VLMs, addressing a critical gap in current research.
Primary Area: transfer learning, meta learning, and lifelong learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 13744
Loading