Abstract: Various model selection methods can be applied to seek sparse subsets of the covariates to explain the response of
interest in bioinformatics. While such methods often offer very helpful predictive performances, their selections of the covariates
may be much less trustworthy. Indeed, when the number of covariates is large, the selections can be highly unstable, even
under a slight change of the data. This casts a serious doubt on reproducibility of the identified variables. For a sound scientific
understanding of the regression relationship, methods need to be developed to find the most important covariates that have
higher chance to be confirmed in future studies. Such a method based on variable selection deviation is proposed and evaluated
in this work.
0 Replies
Loading