Metabolomics biomarker discovery using multimodal memetic algorithm and multivariate mutual information based feature selection

Published: 2016, Last Modified: 11 Apr 2025CEC 2016EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Metabolomics data has the nature of small sample number, high dimensional, and noisy, which poses great challenges on its analysis. In this paper we propose a novel filter feature selection algorithm, namely MMAFS, for the metabolomics biomarker discovery. The MMAFS utilizes a metaheuristics chain based multimodal memetic algorithm to effectively select both local and global optimal feature subgroups that potentially contain biological meanings. A nearest-neighbor graphic based multivariate mutual information estimation is used to calculate fitness values under the max-dependency criterion. Finally, we introduce a semi-wrapper classification to improve the prediction accuracy. The MMAFS is applied on three real-world metabolomics spectrum data sets. Experimental results on 10 runs of 10-fold external cross validation show that the proposed algorithm outperforms other representative feature selection methods. Particularly, some biomarkers found by MMAFS have been proved by previous researches.
Loading