Triple-Stream Commonsense Circulation Transformer Network for Image Captioning

Published: 2024, Last Modified: 19 Feb 2025Comput. Vis. Image Underst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We combine innovative commonsense knowledge with channel and region feature.•We develop a channel attention mechanism that ensures pure semantic features.•We verify that the soft router mechanism is an effective fusion method for TCCTN.•We achieve good results on the MS-COCO and verify the contextual knowledge of TCCTN.
Loading