Leveraging ASR N-Best in Deep Entity Retrieval

Haoyu Wang; John Chen; Majid Laali; Kevin Durda; Jeff King; William Campbell; Yang Liu

Leveraging ASR N-Best in Deep Entity Retrieval

Haoyu Wang, John Chen, Majid Laali, Kevin Durda, Jeff King, William Campbell, Yang Liu

Published: 01 Jan 2021, Last Modified: 13 Nov 2024Interspeech 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Entity Retrieval (ER) in spoken dialog systems is a task that retrieves entities in a catalog for the entity mentions in user utterances. ER systems are susceptible to upstream errors, with Automatic Speech Recognition (ASR) errors being particularly troublesome. In this work, we propose a robust deep learning based ER system by leveraging ASR N-best hypotheses. Specifically, we evaluate different neural architectures to infuse ASR N-best through an attention mechanism. On 750 hours of audio data taken from live traffic, our best model achieves 11.07% relative error reduction while maintaining the same performance on rejecting out-of-domain ER requests.

Loading