Diff-ZsVQA: Zero-shot Visual Question Answering with Frozen Large Language Models Using Diffusion Model

Quanxing Xu, Jian Li, Yuhao Tian, Ling Zhou, Feifei Zhang, Rubing Huang

Published: 2025, Last Modified: 11 Apr 2025Expert Syst. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•This paper is the first work to use diffusion models in LLM-based VQA.•This paper proposes a novel prompt that reduces the impact brought by questions.•For conducting zero-shot VQA, this paper can be compatible with different LLMs.•Compared with others, this paper achieves comparable results with higher speed.