Collaborative three-stream transformers for video captioning

Published: 2023, Last Modified: 14 May 2025Comput. Vis. Image Underst. 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•COST is proposed as a novel multi-branch framework for video captioning.•Designed module aligns interactions across branches, leading to precise captioning.•Proposed training objective enhances COST via constraints on embeddings’ semantics.•Abundant experiments show our method performs favorably against the state-of-the-art.
Loading