OpenViVQA: Task, dataset, and multimodal fusion models for visual question answering in Vietnamese

Nghia Hieu Nguyen, Duong T. D. Vo, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Published: 2023, Last Modified: 11 Apr 2025Inf. Fusion 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•We define a new form of Visual Question Answering (VQA) - the open-ended VQA.•We introduce the Open-domain Vietnamese Visual Question Answering (OpenViVQA) dataset.•We propose novel multimodal fusion models that perform human-like answer generation.•Our experiments and results show that open-ended VQA is a challenging task.