Interactive Spoken Content Retrieval by Deep Reinforcement LearningDownload PDFOpen Website

2018 (modified: 11 Nov 2021)IEEE ACM Trans. Audio Speech Lang. Process. 2018Readers: Everyone
Abstract: For text content retrieval, the user can easily scan through and select from a list of retrieved items. This is impossible for spoken content retrieval, because the retrieved items are not easily displayed on-screen. In addition, due to the high degree of uncertainty for speech recognition, retrieval results can be very noisy. One way to counter such difficulties is through user-machine interaction. The machine can take different actions to interact with the user to obtain better retrieval results before showing them to the user. For example, the machine can request extra information from the user, return a list of topics for the user to select from, and so on. In this paper, we propose using deep-Q-network (DQN) to determine the machine actions for interactive spoken content retrieval. DQN bypasses the need to estimate hand-crafted states, and directly determines the best action based on the present retrieval results even without any human knowledge. It is shown to achieve significantly better performance as compared with the previous hand-crafted states. We further find that double DQN and dueling DQN improve the naive version.
0 Replies

Loading