An Attention-Driven Multi-label Image Classification with Semantic Embedding and Graph Convolutional Networks

Dengdi Sun, Leilei Ma, Zhuanlian Ding, Bin Luo

Published: 01 Jan 2023, Last Modified: 14 Oct 2023Cogn. Comput. 2023Readers: Everyone

Abstract: Multi-label image classification is a fundamental and vital task in computer vision. The latest methods are mostly based on deep learning and exhibit excellent performance in understanding images. However, in previous studies, only capture the image content information has been captured using convolutional neural networks (CNNs), and the semantic structure information and implicit dependencies between labels and image regions have been ignored. Therefore, it is necessary to develop more effective methods for integrating semantic information and visual features in multi-label image classification. In this study, we propose a novel framework for multi-label image classification, named FLNet, which simultaneously takes advantage of the visual features and semantic structure. Specifically, to enhance the association between semantic annotations and image regions, we first integrate the attention mechanism with a CNN to focus on the target regions while ignoring other useless surrounding information and then employ graph convolutional network (GCN) to capture the structure information between multiple labels. Based on our architecture, we also introduce the lateral connections to repeatedly inject the label system into the CNN backbone during the GCN learning process to improve performance and, consequently, learn interdependent classifiers for each image label. We apply our method to multi-label image classification. The experiments on two public multi-label benchmark datasets, namely, MS-COCO and PASCAL visual object classes challenge (VOC 2007), demonstrate that our approach outperforms other existing state-of-the-art methods. Our method learns specific target regions and enhances the association between labels and image regions by using semantic information and attention mechanism. Thus, we combine the advantages of both visual and semantic information to further improve the image classification performance. Finally, the correctness and effectiveness of the proposed method are proven by visualizing the classifier results.

0 Replies