Categorical understanding using statistical ngram models

Alexandros Potamianos, Giuseppe Riccardi, Shrikanth S. Narayanan

Published: 1999, Last Modified: 24 Jun 2024EUROSPEECH 1999EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, the speech understanding problem in the context of a spoken dialog system is formalized in a maximum likelihood framework. Word and dialog-state n-grams are used for building categorical understanding and dialog models, respectively. Acoustic confidence scores are incorporated in the understanding formulation. Problems due to data sparseness and out-of-vocabulary words are discussed. Incorporating dialog models reduces relative understanding error rate by 1525%, while acoustic confidence scores achieve a further 10% error reduction for a computer gaming application.