Prosodic knowledge sources for automatic speech recognition

Dimitra Vergyri, Andreas Stolcke, Venkata Ramana Rao Gadde, Luciana Ferrer, Elizabeth Shriberg

Published: 2003, Last Modified: 03 May 2023ICASSP (1) 2003Readers: Everyone

Abstract: In this work, different prosodic knowledge sources are integrated into a state-of-the-art large vocabulary speech recognition system. Prosody manifests itself on different levels in the speech signal: within the words as a change in phone durations and pitch, in between the words as a variation in the pause length, and beyond the words, correlating with higher linguistic structures and nonlexical phenomena. We investigate three models, each exploiting a different level of prosodic information, in rescoring N-best hypotheses according to how well recognized words correspond to prosodic features of the utterance. Experiments on the Switchboard corpus show word accuracy improvements with each prosodic knowledge source. A further improvement is observed with the combination of all models, demonstrating that they each capture somewhat different prosodic characteristics of the speech signal.

0 Replies