Topic Detection in Conversational Telephone Speech Using CNN with Multi-stream Inputs

Jian Sun, Wu Guo, Zhi Chen, Yan Song

Published: 2019, Last Modified: 03 Apr 2025ICASSP 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Topic detection for conversational telephone speech (CTS) is addressed in this paper. The low accuracy of automatic speech recognition (ASR) will cause severe performance deterioration for topic detection. To make up for this, we adopt two ASR systems, HMM-BiLSTM and CTC systems, to provide complementary information for topic detection. After obtaining two sets of different recognized transcriptions, a CNN with multi-stream inputs is trained, and the pooling layer serves as document representations. Finally, element-wise summation of document representations from two streams is used as distributed representations of the documents, which are fed into agglomerative hierarchical clustering (AHC) algorithms to obtain clustering results. The experiments on a Japanese speech corpus demonstrate that the proposed approach can significantly improve the performance of topic detection.