Topic Detection by Clustering KeywordsDownload PDFOpen Website

2008 (modified: 17 Dec 2021)DEXA Workshops 2008Readers: Everyone
Abstract: We consider topic detection without any prior knowledge of category structure or possible categories. Keywords are extracted and clustered based on different similarity measures using the induced k-bisecting clustering algorithm. Evaluation on Wikipedia articles shows that clusters of keywords correlate strongly with the Wikipedia categories of the articles. In addition, we find that a distance measure based on the Jensen-Shannon divergence of probability distributions outperforms the cosine similarity. In particular, a newly proposed term distribution taking co-occurrence of terms into account gives best results.
0 Replies

Loading