A GA-SVM Feature Selection Model Based on High Performance Computing Techniques

Tianyou Zhang, Xiuju Fu, Rick Siow Mong Goh, Chee Keong Kwoh, Gary Kee Khoon Lee

Published: 2009, Last Modified: 17 Nov 2023SMC 2009Readers: Everyone

Abstract: Supervised learning is well-known and widely applied in many domains including bioinformatics, cheminformatics and financial forecasting. However, the interference from irrelevant features may lead to the poor accuracy of classifiers. As a popular feature selection model, GA-SVM is desirable in many of those cases to filter out irrelevant features and improve the learning performance subsequently. However, the high computational cost strongly discourages the application of GA-SVM in large-scale datasets. In this paper, an HPC-enabled GA-SVM (HGA-SVM) is proposed by integrating data parallelization, multithreading and heuristic techniques with the ultimate goal of robustness and low computational cost. Our proposed model is comprised of four improvement strategies: 1) GA parallelization, 2) SVM parallelization, 3) neighbor search and 4) evaluation caching. All the four strategies improve various aspects of the feature selection model and contribute collectively towards higher computational throughput.

0 Replies