- Abstract: Relational reasoning methods based on graph networks are currently state-of-the-art models for Visual Question Answering (VQA) tasks involving real images. Although graph networks are used in these models to enrich visual representations by encoding question-adaptive inter-object relations, these simple graph networks is arguably insufficient to perform visual reasoning for VQA tasks. In this paper, we propose a Reasoning-Aware Graph Convolutional Networks (RA-GCN) that goes one step further towards visual reasoning for GCNs. Our first contribution is the introduction of visual reasoning ability into conventional GCNs. Secondly, we strengthen the expressive power of GCNs via introducing node-sensitive kernel parameters based on edge features to address the limitation of shared transformation matrix for each node in GCNs. Finally, we provide a novel iterative reasoning network architecture for solving VQA task via embedding the RA-GCN module into an iterative process. We evaluate our model on the VQA-CP v2, GQA and Clevr dataset. Our final RA-GCN network successfully achieves state-of-the-art accuracy which is 42.3% on the VQA-CP v2, and highly competitive 62.4% accuracy on the GQA, as well as 90.0% on val split of Clevr dataset.
- Keywords: graph convolutional networks, visual reasoning, visual question answering
- Original Pdf: pdf