CooKie: commonsense knowledge-guided mixture-of-experts framework for fine-grained visual question answering

Published: 01 Jan 2025, Last Modified: 01 Aug 2025Inf. Sci. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•Our novel framework makes fine-grained visual question answering tasks more grounded.•We annotate more instances to do further analysis of our method.•Our method achieves SOTA performance on V* Bench, with 4% more accuracy.
Loading