Abstract: The discovery of biomarker genes from gene expression data is a hot topic for understanding the mechanisms underlying disease etiology. However, while the collection of high-dimensional gene expression data has been made possible by the adoption of technologies such as DNA microarray, it also poses challenges for the identification of key disease-causing genes due to its high-dimensional nature. To address this problem, we propose a feature weighting particle swarm optimization method (FWPSO) for efficiently identifying biomarker genes from high-dimensional microarray data. Specifically, there are two significant phases in FWPSO: 1) Feature Weighting Phase: Features will be discriminated into relevant and irrelevant based on the evolutionary performance of individuals in the PSO population in each generation, and features will be assigned weights based on this. 2) Feature Selection Phase: By focusing the search on a feature set that have been determined to be relevant based on the results of the previous phase, the PSO population will improve the efficiency of removing redundant features and discovering the most related genes. Both phases work together and operate in synergy to achieve the optimized results. The experimental results on four microarray datasets shows that FWPSO not only reduces the number of feature dimensions to a large extent, but also achieves higher classification accuracy compared to other methods, demonstrating the effectiveness of our method. Our implementation of FWPSO is available at https://github.com/wangxb96/FWPSO.
Loading