Abstract: Topic detection for conversational telephone speech (CTS) is addressed in this paper. The low accuracy of automatic speech recognition (ASR) will cause severe performance deterioration for topic detection. To make up for this, we adopt two ASR systems, HMM-BiLSTM and CTC systems, to provide complementary information for topic detection. After obtaining two sets of different recognized transcriptions, a CNN with multi-stream inputs is trained, and the pooling layer serves as document representations. Finally, element-wise summation of document representations from two streams is used as distributed representations of the documents, which are fed into agglomerative hierarchical clustering (AHC) algorithms to obtain clustering results. The experiments on a Japanese speech corpus demonstrate that the proposed approach can significantly improve the performance of topic detection.
Loading