Implicit and explicit commonsense for multi-sentence video captioning

Shih-Han Chou, James J. Little, Leonid Sigal

Published: 2024, Last Modified: 20 Feb 2025Comput. Vis. Image Underst. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•A new task of video-based instruction generation that requires commonsense knowledge.•A new model using implicit and explicit commonsense to enhance sentence prediction.•We analyze the contributions of knowledge made toward improved caption quality.•Our model also achieves state-of-the-art performance on dense video captioning tasks.