A Hybrid Feature Selection Algorithm Applied to High-dimensional Imbalanced Small-sample Data Classification
Abstract: With the rapid development of microarray technology and interdisciplinary science, it is possible for microarray technology to be used to predict diseases. Microarray technology has the advantages of high speed, high efficiency and reliability in disease prediction. However, microarray data are usually high-dimensional with small samples, additionally, the samples are often imbalanced, which brings a lot of difficulties to researchers. In view of the above problems, it is proposed in this paper a Filter-Wrapper hybrid feature selection algorithm Union Information Gini Cost-sensitive Feature Selection General Vector Machine (UIG-CFGVM) to tackle the high-dimensional imbalanced small-sample problem. The improved hybrid algorithm is as follows: Firstly, the most common features are removed by the proposed hybrid filter algorithm UIG, which is obtained by Information Gain (Info)and Gini Index (Gini). Secondly, Cost-sensitive Feature selection General Vector Machine (CFGVM) is used as Wrapper method to further improve the performance of the algorithm. The experimental results show that the proposed algorithm UIG-CFGVM has better classification performance in seven biomedical high-dimensional imbalanced small-sample datasets compared with other similar algorithms.
Loading