Abstract: Continuously measuring the speaking state of users with robot in a human-robot Interaction(HRI) system improves metrics of interaction quality. Meanwhile, mainstream active speaker detection (ASD) algorithms emphasize achieving high AUCs at frame level in the AVA-Active Speaker dataset and pay less attention to get real-time performance in robotic systems. In this paper, we propose a model named FSDNet to keep a high AUC score in the AVA-Active Speaker dataset while reducing time cost, our model increase AUC score by 0.1% compared with the State-Of-The-Art and need only 75% running time. Furthermore, we put forward an architecture with a time-related prediction function to make our algorithm more effective and generative in interactive robotic systems. The code is released at https://github.com/huangmin9966/FSDNet_RealTimeArch.
0 Replies
Loading