OpenViVQA: Task, dataset, and multimodal fusion models for visual question answering in Vietnamese

Published: 01 Jan 2023, Last Modified: 11 Apr 2025Inf. Fusion 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We define a new form of Visual Question Answering (VQA) - the open-ended VQA.•We introduce the Open-domain Vietnamese Visual Question Answering (OpenViVQA) dataset.•We propose novel multimodal fusion models that perform human-like answer generation.•Our experiments and results show that open-ended VQA is a challenging task.
Loading