GenDesc: A Partial Generalization of Linguistic Features for Text Classification

Guillaume Tisserant, Violaine Prince, Mathieu Roche

Published: 2013, Last Modified: 04 Oct 2025NLDB 2013EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This paper presents an application that belongs to automatic classification of textual data by supervised learning algorithms. The aim is to study how a better textual data representation can improve the quality of classification. Considering that a word meaning depends on its context, we propose to use features that give important information about word contexts. We present a method named GenDesc, which generalizes (with POS tags) the least relevant words for the classification task.

External IDs:dblp:conf/nldb/TisserantPR13