Abstract: Sentiment analysis is able to automatically extract valuable customer information from large amount of unstructured text data to support decision making in manufacturing applications such as product design and demand planning. One of the key issues of sentiment analysis is the high dimensionality of data, which can be effectively solved by feature selection. Existing feature selection techniques compute feature scores solely based on training data statistics or by modifying a specific feature metric formula to include test data information which can not be generalized to other types of feature metrics. In this paper, we propose an adaptive two-stage feature selection approach, which generates base feature scores from a training dataset and then weights them based on individual test sample so that the feature importance evaluation is adapted to the characteristic of test data as well. The proposed method is applicable to arbitrary type of feature metrics and sentiment classifiers. The experiments show that our approach can consistently outperform other methods, especially for the setting of small number of selected features.
0 Replies
Loading