Using Term Clustering and Supervised Term Affinity Construction to Boost Text Classification

Chong Wang, Wenyuan Wang

2005 (modified: 11 Apr 2022)PAKDD 2005Readers: Everyone

Abstract: The similarity measure is a crucial step in many machine learning problems. The traditional cosine similarity suffers from its inability to represent the semantic relationship of terms. This paper explores the kernel-based similarity measure by using term clustering. An affinity matrix of terms is constructed via the co-occurrence of the terms in both unsupervised and supervised ways. Normalized cut is employed to do the clustering to cut off the noisy edges. Diffusion kernel is adopted to measure the kernel-like similarity of the terms in the same cluster. Experiments demonstrate our methods can give satisfactory results, even when the training set is small.

0 Replies