A comprehensive survey of Vision-Language Models: Pretrained models, fine-tuning, prompt engineering, adapters, and benchmark datasets
Abstract: Highlights•Comprehensive analysis of VLM components like tuning, prompts, and datasets.•Explores adapter-based tuning and low-resource learning for efficiency gains.•Covers advances in contrastive pre-training and prompt engineering methods.•Discusses challenges in benchmarking, data diversity, and annotation quality.•Highlights future VLM challenges, including ethics, scaling, and adaptation.
External IDs:dblp:journals/inffus/DanishSKDTM26
Loading