Diff-ZsVQA: Zero-shot Visual Question Answering with Frozen Large Language Models Using Diffusion Model

Published: 01 Jan 2025, Last Modified: 11 Apr 2025Expert Syst. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•This paper is the first work to use diffusion models in LLM-based VQA.•This paper proposes a novel prompt that reduces the impact brought by questions.•For conducting zero-shot VQA, this paper can be compatible with different LLMs.•Compared with others, this paper achieves comparable results with higher speed.
Loading