Gated ConvNets for Letter-Based ASR

Vitaliy Liptchinsky; Gabriel Synnaeve; Ronan Collobert

Gated ConvNets for Letter-Based ASR

Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

15 Feb 2018 (modified: 10 Feb 2022)ICLR 2018 Conference Blind SubmissionReaders: Everyone

Abstract: In this paper we introduce a new speech recognition system, leveraging a simple letter-based ConvNet acoustic model. The acoustic model requires only audio transcription for training -- no alignment annotations, nor any forced alignment step is needed. At inference, our decoder takes only a word list and a language model, and is fed with letter scores from the acoustic model -- no phonetic word lexicon is needed. Key ingredients for the acoustic model are Gated Linear Units and high dropout. We show near state-of-the-art results in word error rate on the LibriSpeech corpus with MFSC features, both on the clean and other configurations.

TL;DR: A letter-based ConvNet acoustic model leads to a simple and competitive speech recognition pipeline.

Keywords: automatic speech recognition, letter-based acoustic model, gated convnets

Data: [LibriSpeech](https://paperswithcode.com/dataset/librispeech)

9 Replies

Loading