DEEP COMPLEMENTARY BOTTLENECK FEATURES FOR VISUAL SPEECHRECOGNITION

Stavros Petridis, Maja Pantic

12 Jan 2020OpenReview Archive Direct UploadReaders: Everyone

Abstract: Deep bottleneck features (DBNFs) have been used success-fully in the past for acoustic speech recognition from audio.However, research on extracting DBNFs for visual speechrecognition is very limited. In this work, we present an ap-proach to extract deep bottleneck visual features based ondeep autoencoders. To the best of our knowledge, this is thefirst work that extracts DBNFs for visual speech recognitiondirectly from pixels. We first train a deep autoencoder with abottleneck layer in order to reduce the dimensionality of theimage. Then the autoencoder’s decoding layers are replacedby classification layers which make the bottleneck featuresmore discriminative. Discrete Cosine Transform (DCT) fea-tures are also appended in the bottleneck layer during train-ing in order to make the bottleneck features complementary toDCT features. Long-Short Term Memory (LSTM) networksare used to model the temporal dynamics and the performanceis evaluated on the OuluVS and AVLetters databases. The ex-tracted complementary DBNF in combination with DCT fea-tures achieve the best performance resulting in an absoluteimprovement of up to 5% over the DCT baseline.

0 Replies