Abstract: Visual information is always combined as a complementary source to enhance the understanding of what the speaker is talking about, especially in a noisy environment. This paper researches on different lip features for visual speech and speaker recognition, and their robustness to different uttering habits is conducted in-depth analysis. Five feature candidates extracted from lip shape are tested and compared on a multispeaker visual speech recognition task of isolated English digits (0~9). Our experimental results demonstrate that the rotational angle caused by head pose is highly correlated with the individual speaker, but independent of the content of speech. The best shape features for speech and speaker recognition are considered to be those providing the “dynamic” information, like rotation and lip motion.
0 Replies
Loading