Adaptive multimodal fusion with web resources for scene classification

Dongzhe Wang, Kezhi Mao, Gee Wah Ng, Tien Pham

Published: 2016, Last Modified: 08 Mar 2025FUSION 2016EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: To train a scene classifier with good generalization capability, a large number of human labeled training images are often needed. However, a large number of well-labeled training images may not always be available. To alleviate this problem, the web resources-aided scene classification framework was proposed. The present paper is a new development based on our previously proposed framework [1], with the following improvements. First, a text-based filtering algorithm is developed to remove irrelevant web search returns since irrelevant web search returns provide irrelevant or even wrong information about the class of an image. Second, an adaptive fusion algorithm is developed for the integration of visual feature-based and web textual feature-based classification results. This adaptive fusion algorithm is inspired by the multisensory integration mechanism of human whose adaptability is achieved by reliability-dependent weighting of different sensory modalities. Experimental results show that the proposed web textual resources aided image classification framework can improve classification accuracy of some classes by 13% and 12% in the UIUC-Sports and LabelMe8 datasets, respectively.