A Compact Representation of Visual Speech Data Using Latent Variables.Download PDFOpen Website

2014 (modified: 02 Feb 2024)IEEE Trans. Pattern Anal. Mach. Intell.2014Readers: Everyone
Abstract: The problem of visual speech recognition involves the decoding of the video dynamics of a talking mouth in a high-dimensional visual space. In this paper, we propose a generative latent variable model to provide a compact representation of visual speech data. The model uses latent variables to separately represent the inter-speaker variations of visual appearances and those caused by uttering, and incorporates the structural information of the observed visual data within an utterance through modelling the structure using a path graph and placing variables' priors along its embedded curve.
0 Replies

Loading