Multimodal Video Understanding using Graph Neural NetworkDownload PDF

Published: 22 Nov 2022, Last Modified: 05 May 2023NeurIPS 2022 GLFrontiers WorkshopReaders: Everyone
Abstract: Majority of existing semantic video understanding methods process every video independently without considering the underlying inter-video relationships. However, videos uploaded by individuals on social media platforms like YouTube, Instagram etc. exhibit inter-video relationship which are a reflection of individual’s interest, geography, culture etc. In this work, we explicitly attempt to model this inter-video relationship, originating from the creators of these videos using Graph Neural Networks (GNN) in a multimodal setup. We perform video classification by leveraging the creators of the videos and semantic similarity between for creating edges between videos and observe improvements of 4% in accuracy
1 Reply