RIFS2D: A two-dimensional version of a randomly restarted incremental feature selection algorithm with an application for detecting low-ranked biomarkers
Abstract: The era of big data introduces both opportunities and challenges for biomedical researchers. One of the inherent
difficulties in the biomedical research field is to recruit large cohorts of samples, while high-throughput biotechnologies
may produce thousands or even millions of features for each sample. Researchers tend to evaluate
the individual correlation of each feature with the class label and use the incremental feature selection (IFS)
strategy to select the top-ranked features with the best prediction performance. Recent experimental data showed
that a subset of continuously ranked features randomly restarted from a low-ranked feature (an RIFS block) may
outperform the subset of top-ranked features. This study proposed a feature selection Algorithm RIFS2D by
integrating multiple RIFS blocks. A comprehensive comparative experiment was conducted with the IFS, RIFS
and existing feature selection algorithms and demonstrated that a subset of low-ranked features may also achieve
promising prediction performance. This study suggested that a prediction model with promising performance
may be trained by low-ranked features, even when top-ranked features did not achieve satisfying prediction
performance. Further comparative experiments were conducted between RIFS2D and t-tests for the detection of
early-stage breast cancer. The data showed that the RIFS2D-recommended features achieved better prediction
accuracy and were targeted by more drugs than the t-test top-ranked features.
Loading