When I Fall in Love: Capturing Video-Oriented Social Relationship Evolution via Attentive GNN

Published: 01 Jan 2024, Last Modified: 11 Feb 2025IEEE Trans. Circuits Syst. Video Technol. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With the booming of streaming media platforms, viewers now get used to watching dramas and movies via online platforms with more intelligent services. Usually, character relationships may dynamically evolve with stories promoting in long videos. Therefore, automatic tools to capture the social relation evolution among characters are urgently required to enrich the viewing experience. However, most existing works mainly focus on shorter isolated video clips. Considering the development of the plot, they may fail to effectively summarize relationships as holistic semantic representations for the whole video. To deal with these challenges, in this paper, we propose a novel Dynamic-Evolutionary Graph Attention Network (DE-GAT) framework to generate the evolving social relation graph among characters and capture the characters’ relation evolutionary trajectory throughout the entire video. DE-GAT first integrates the multimodal cues, including visual and textual information in each video clip via the graph attention network (GAT). Expanding the temporal receptive field from clip-level to scenario-level, the most relevant factors of the evolution of social relationships can be explored. Eventually, all the scenario-level social graphs are merged to obtain the evolving global social graph for the entire movie. Extensive evaluations on the real-world MovieGraphs dataset have validated the positive impact of temporal receptive field expansion and multimodal cues on capturing evolving social relations.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview