Abstract: With the booming of streaming media platforms, viewers now get used to watching dramas and movies via online platforms with more intelligent services. Usually, character relationships may dynamically evolve with stories promoting in long videos. Therefore, automatic tools to capture the social relation evolution among characters are urgently required to enrich the viewing experience. However, most existing works mainly focus on shorter isolated video clips. Considering the development of the plot, they may fail to effectively summarize relationships as holistic semantic representations for the whole video. To deal with these challenges, in this paper, we propose a novel Dynamic-Evolutionary Graph Attention Network (DE-GAT) framework to generate the evolving social relation graph among characters and capture the characters’ relation evolutionary trajectory throughout the entire video. DE-GAT first integrates the multimodal cues, including visual and textual information in each video clip via the graph attention network (GAT). Expanding the temporal receptive field from clip-level to scenario-level, the most relevant factors of the evolution of social relationships can be explored. Eventually, all the scenario-level social graphs are merged to obtain the evolving global social graph for the entire movie. Extensive evaluations on the real-world MovieGraphs dataset have validated the positive impact of temporal receptive field expansion and multimodal cues on capturing evolving social relations.
Loading