Diff-ZsVQA: Zero-shot Visual Question Answering with Frozen Large Language Models Using Diffusion Model
Abstract: Highlights•This paper is the first work to use diffusion models in LLM-based VQA.•This paper proposes a novel prompt that reduces the impact brought by questions.•For conducting zero-shot VQA, this paper can be compatible with different LLMs.•Compared with others, this paper achieves comparable results with higher speed.
Loading