Comparative Analysis of Existing and a Novel Approach to Topic Detection on Conversational Dialogue Corpora
Abstract: Topic detection in dialogue corpora has become a major challenge for a conversational systems, with efficient conversational topic prediction being a critical part of constructing cohesive and engaging dialogue systems (Sunet al., 2019). This paper proposed unsupervised and semi-supervised techniques for topic detection in conversational dialogue corpora and compared them with existing techniques. However, these existing topic detection techniques are widely applied to textual tweets, blogs, documents, textual data on the web. Therefore, we applied these existing techniques to dialogue corpora to detect the topics and compared them with the proposed approach because textual dialogues typically are irregular and short sentences. The paper proposes a novel approach for topic detection, which combines the clustering of known similar words, TF-IDF scores and 'bag of words' techniques (BOW) with the Parallel Latent Dirichlet Allocation (PLDA) Model to achieve topic detection. The approach also integrates the elbow method for interpretation and validation to select the optimal number of clusters. The paper comprises a comparative analysis of traditional LDA and clustering approaches across both unlabelled (unsupervised) and partially labelled (semi-supervised) switchboard corpus with a proposed novel approach. The evaluation results shows that proposed approach performs best using partially labelled topic dialogue corpora and out performed traditional and unsupervised methods.
Paper Type: long
0 Replies
Loading