Multi-label Image Recognition Based on Multi-modal Graph Convolutional Networks Using Captioning Features

Abstract: This paper presents a multi-label image recognition model based on multi-modal Graph Convolutional Networks (GCN) using captioning features. GCN have been frequently exploited to model label dependencies in recent studies of multi-label image recognition. However, the dependencies are built on information directly from the images. In this paper, we imports captioning features into the GCN-based multi-label image recognition model to use both image and text information. By introducing image captioning into the GCN model, the accuracy of multi-label image recognition can be improved. The experiment results show the effectiveness of the proposed method.
0 Replies
Loading