Abstract: Highlights•We define a new form of Visual Question Answering (VQA) - the open-ended VQA.•We introduce the Open-domain Vietnamese Visual Question Answering (OpenViVQA) dataset.•We propose novel multimodal fusion models that perform human-like answer generation.•Our experiments and results show that open-ended VQA is a challenging task.
Loading