Vision-Language Instruction-enhanced Tuning via Parameter-efficient Learning

17 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Parameter-efficient Learning, Instruction Tuning, MultiModal
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Instruction tuning has shown promising potential for developing general-purpose AI capabilities in large-scale pretrained models. In multimodal community, this has motivated growing research on enhancing instruction tuning to integrate multimodal information for creative applications. However, existing works have two main limitations: the high training costs and heavy computing resource dependence of full model fine-tuning, and the lack of semantic information in instructions, which hinders multimodal alignment. In this paper, we propose a novel architecture called Vision-Language Instruction-enhanced Tuning via Parameter-efficient Learning (VITAL). Our proposed VITAL first enables lightweight model training using only 2% of parameters through automatic mode approximation. More importantly, VITAL enhances instruction semantics from two perspectives: 1) aggregating more context via enhanced instruction mixture to aid multimodal fusion, and 2) strengthening the connection between the proposed parameter-efficient tuning method and mutual information through our proposed score-based information bottleneck. Validation experiments on six multimodal downstream benchmarks demonstrate that VITAL outperforms state-of-the-art approaches in most cases, even surpassing the performance of full fine-tuning. Besides, extensive experiments on the few-shot setting as well as various visualization analyses have also fully validated our advantages.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 889
Loading