Fully-attentive iterative networks for region-based controllable image and video captioning

Marcella Cornia, Lorenzo Baraldi, Ayellet Tal, Rita Cucchiara

Published: 2023, Last Modified: 14 Nov 2024Comput. Vis. Image Underst. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•We propose a fully-attentive and iterative network for controllable image captioning.•We design novel attention operators that can deal with region-based control signals.•We introduce a decoder which explicitly focuses on each part of the control signal.•State-of-the-art performance on both image and video controllable captioning.