Abstract: This paper presents an application that belongs to automatic classification of textual data by supervised learning algorithms. The aim is to study how a better textual data representation can improve the quality of classification. Considering that a word meaning depends on its context, we propose to use features that give important information about word contexts. We present a method named GenDesc, which generalizes (with POS tags) the least relevant words for the classification task.
External IDs:dblp:conf/nldb/TisserantPR13
Loading