Spotting Visual Keywords from Temporal Sliding Windows

Yue Yao, Tianyu Wang, Heming Du, Liang Zheng, Tom Gedeon

2019 (modified: 16 Nov 2022)ICMI 2019Readers: Everyone

Abstract: Visual Keyword Spotting (KWS), as a newly proposed task deriving from visual speech recognition, has plenty of room for improvements. This paper details our Visual Keyword Spotting system used in the first Mandarin Audio-Visual Speech Recognition Challenge (MAVSR 2019). With the assumption that the vocabularies of target dataset are a subset of the vocabulary of the training set, we proposed a simple and scalable classification based strategy that achieves 19.0% mean average precision (mAP) on this challenge. Our method is based on the idea of using sliding windows to bridge between the word-level dataset and the sentence-level dataset, showing that a strong word level classifier can be directly used in building sentence embedding, thereby making it possible to build a KWS system.

0 Replies