Novel Truncated-rank Graph-structured and Tree-guided Sparse Linear Mixed Models for Variable Selection on Genome-wide Association Studies
Abstract: Variable selection for genome-wide association studies is a key focus for bioinformatics researchers in high-performance computing. The rapid growth of biological and biomedical data demands has led to high-dimensional, heterogeneous datasets characterized by non-i.i.d. properties and numerous response variables, often resulting in false negatives or positives in recovered results. Traditional methods, when nal̈ively applied, yield suboptimal performance due to confounding factors. To account for the complex interdependencies in heterogeneous data and enhance the practical outcomes of genome-wide association studies, we introduce two methods, TGsLMM and TTsLMM, which balance effects between response and explanatory variables for subpopulation inference. Our unified framework performs sparse variable selection using graph-structured or tree-guided structures in a low-rank linear mixed model. Additionally, we extend our approach to high-dimensional datasets and adaptively select the covariance structure for genomic data. Extensive experiments on synthetic and three real-world datasets emphasize the robustness and effectiveness of our proposed methods, achieving the highest ROC area compared to baselines and superior results for future potential.
Loading