Abstract: The document similarity measure is a key point in textual data processing. It is the main responsible of the performance of a processing system. Since a decade, kernels are used as similarity functions within inner-product based algorithms such as the SVM for NLP problems and especially for text categorization. In this paper, we present a semantic space constructed from latent concepts. The concepts are extracted using the Latent Semantic Analysis (LSA). To take into account of the specificity of each document category, we use the local LSA to define the global semantic space. Furthermore, we propose a weighted semantic kernel for the global space. The experimental results of the kernel, on text categorization tasks, show that this kernel performs better than global LSA kernels and especially for small LSA dimensions.
Loading