From Less to More: Common-Sense Semantic Perception Benefits Image Captioning

Published: 2022, Last Modified: 23 Jan 2026APWeb/WAIM (2) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Most recent arts in image captioning rely solely on exploring the information contains in the image or modeling the inner-relations among visual features, which fails to generate informative captions in some cases. Part of what defines humans is the ability of common-sense reasoning behind semantic association, which is different from machines. To this end, we propose a Common-Sense Aware method (CSA) for image captioning, which capitalizes general prior knowledge to associate extra semantic information during generation to infer more informative captions. Specifically, based on ConceptNet, we extract common-sense knowledge features using pre-generated concepts to provide comprehensive associated semantic information for captioning. We conduct extensive experiments on the MS COCO dataset to demonstrate the effectiveness of CSA, results show that it furthers state-of-the-arts.
Loading