A meta-heuristic feature selection algorithm combining random sampling accelerator and ensemble using data perturbation
Abstract: Meta-heuristic algorithms have been extensively utilized in feature selection tasks because they can obtain the global optimal solution. However, the meta-heuristic algorithm will take too much time in the face of a large number of samples. Although most of the studies compromise to approximate optimal solutions for avoiding time-consuming problems, a new problem with reduced classification performance, especially classification stability, is then generated. Aiming to above problems, this paper proposes a new feature selection framework. First, this framework exploits a voting ensemble strategy to improve classification stability by reducing the impact of misclassified labels on the overall classification results. Second, the framework uses a data perturbation strategy to enhance classification accuracy. In particular, the data perturbation strategy is able to generate more neighborhood relationships in the dataset, which could reveal the distribution of various features of the samples. A voting ensemble of different feature distributions is capable of extracting more information from the dataset, then the initially misclassified samples are more likely to be returned to the correct classification. Third, the framework takes a random sampling accelerator into account to solve the problem of excessive time consumption by reducing the size of the search sample space. Finally, for the sake of verifying the effectiveness of the proposed framework, four meta-heuristic feature selection methods based on a neighborhood rough set are compared on 20 datasets. The experimental results indicate that our framework could improve classification performance and accelerate feature selection, particularly in confronting large sample sizes.
Loading