Abstract: In a learning process, features play a fundamental role. In this paper, we propose a Boosting-based feature selection algorithm called BoostFS. It extends AdaBoost which is designed for classification problems to feature selection. BoostFS maintains a distribution over training samples which is initialized from the uniform distribution. In each iteration, a decision stump is trained under the sample distribution and then the sample distribution is adjusted so that it is orthogonal to the classification results of all the generated stumps. Because a decision stump can also be regarded as one selected feature, BoostFS is capable to select a subset of features that are irrelevant to each other as much as possible. Experimental results on synthetic datasets, five UCI datasets and a real malware detection dataset all show that the features selected by BoostFS help to improve learning algorithms in classification problems, especially when the original feature set contains redundant features.
External IDs:dblp:journals/ijprai/MiaoCSLQ15
Loading