Deep Convolutional Neural Network for Correlating Images and SentencesOpen Website

Published: 01 Jan 2018, Last Modified: 23 May 2023MMM (1) 2018Readers: Everyone
Abstract: In this paper, we address the problem of image sentence matching and propose a novel convolutional neural network architecture which includes three modules: the visual module for composing fragmental features of images, the textual module for composing fragmental features of sentences, and the fusional module for encoding features of image and sentence fragments jointly to generate final matching scores of image sentence pairs. Different with previous fragment level models, the proposed method represents fragments of images as feature maps generated by CNN, which is more reasonable and effective. By allowing independent and specialized fragmental feature representations to be leveraged for each modality like image or text, the proposed method is flexible in interlinking the intermediate fragmental features to generate a joint abstraction of two modalities, which provides better matching scores. Extensive evaluations on two benchmark datasets have validated the competitive performance of our approach compared to the state-of-the-art bidirectional image sentence retrieval approaches.
0 Replies

Loading