Intel Labs at ActivityNet Challenge 2022: SPELL for Long-Term Active Speaker Detection

Kyle Min, Sourya Roy, Subarna Tripathi, Tanaya Guha, Somdeb Majumdar

17 Nov 2022 (modified: 17 Nov 2022)OpenReview Archive Direct UploadReaders: Everyone

Abstract: In this report, we describe SPELL, a novel spatial-temporal graph learning framework for active speaker detection (ASD). First, each person in a video frame is encoded in a unique node for that frame. The nodes corresponding to each person across frames are connected to encode their temporal dynamics. Nodes within a frame are also connected to encode inter-person relationships. Thus, SPELL reduces ASD to a node classification task. Importantly, SPELL is able to reason over long temporal contexts for all nodes with low computation cost.

0 Replies