Minimal-redundancy-maximal-relevance feature selection using different relevance measures for omics data classification
Abstract: Omics refers to a field of study in biology such as genomics, proteomics, and metabolomics. Investigating fundamental biological problems based on omics data would increase our understanding of bio-systems as a whole. However, omics data is characterized with high-dimensionality and unbalance between features and samples, which poses big challenges for classical statistical analysis and machine learning methods. This paper studies a minimal-redundancy-maximal-relevance (MRMR) feature selection for omics data classification using three different relevance evaluation measures including mutual information (MI), correlation coefficient (CC), and maximal information coefficient (MIC). A linear forward search method is used to search the optimal feature subset. The experimental results on five real-world omics datasets indicate that MRMR feature selection with CC is more robust to obtain better (or competitive) classification accuracy than the other two measures.
Loading