Fully exploring object relation interaction and hidden state attention for video captioning

Published: 01 Jan 2025, Last Modified: 16 Nov 2025Pattern Recognit. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We design an Object Relation Graph Interaction module (ORGI) for capturing information about objects and their relations.•To implement adequate information flow across all nodes, we specially construct a global node that connects all graph nodes.•We propose a hidden State and Attention Enhanced Decoder (SAED) that concatenates hidden states and updated attentions for improving the prediction ability of next words.
Loading