Implicit and explicit commonsense for multi-sentence video captioning

Published: 01 Jan 2024, Last Modified: 20 Feb 2025Comput. Vis. Image Underst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•A new task of video-based instruction generation that requires commonsense knowledge.•A new model using implicit and explicit commonsense to enhance sentence prediction.•We analyze the contributions of knowledge made toward improved caption quality.•Our model also achieves state-of-the-art performance on dense video captioning tasks.
Loading