Unsupervised Random Forest Manifold Alignment for Lipreading

Yuru Pei, Tae-Kyun Kim, Hongbin Zha

2013 (modified: 10 Nov 2022)ICCV 2013Readers: Everyone

Abstract: Lip reading from visual channels remains a challenging topic considering the various speaking characteristics. In this paper, we address an efficient lip reading approach by investigating the unsupervised random forest manifold alignment (RFMA). The density random forest is employed to estimate affinity of patch trajectories in speaking facial videos. We propose novel criteria for node splitting to avoid the rank-deficiency in learning density forests. By virtue of the hierarchical structure of random forests, the trajectory affinities are measured efficiently, which are used to find embeddings of the speaking video clips by a graph-based algorithm. Lip reading is formulated as matching between manifolds of query and reference video clips. We employ the manifold alignment technique for matching, where the L <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">∞</sub> -norm-based manifold-to-manifold distance is proposed to find the matching pairs. We apply this random forest manifold alignment technique to various video data sets captured by consumer cameras. The experiments demonstrate that lip reading can be performed effectively, and outperform state-of-the-arts.

0 Replies