High Performance Computing Framework for Variable Selection on Genome-wide Association Studies

Xiang Liu, Xueling Liu, Jing Diao, Mengyao Zheng, Jihe Li, Dehui Wei, Qipeng Xie, Xia Li, Linshan Jiang

Published: 2024, Last Modified: 11 Feb 2025BIBM 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Variable selection for genome-wide association studies (GWAS) has been a major research focus for decades. With the exponential growth of biological and biomedical data in the era of big data, scientists are confronted with the challenge of extracting meaningful information from vast datasets while managing the inherent heterogeneity in bioinformatics. To date, there are no highly effective tools that support high-dimensional datasets and achieve robust variable selection performance, all while accounting for the non-i.i.d. features and structured relatedness among explanatory and response variables.To address these challenges, we introduce the first high-performance computing framework for variable selection in GWAS. Our framework integrates various state-of-the-art methods, allowing researchers to easily combine different techniques and fully explore their potential. Additionally, our approach employs novel optimization strategies to solve the problem efficiently, even for high-dimensional data with sparse characteristics. By processing the data holistically, the framework delivers comprehensive analysis and accurate linkage mapping associations. Designed for ease of use, the framework is implemented in Python and offers seamless deployment, making it accessible to a wide range of researchers.