SPAGRM: effectively controlling for sample relatedness in large-scale genome-wide association studies of longitudinal traits
Abstract: Sample relatedness is a major confounder in genome-wide association studies
(GWAS), potentially leading to inflated type I error rates if not appropriately
controlled. A common strategy is to incorporate a random effect related to
genetic relatedness matrix (GRM) into regression models. However, this
approach is challenging for large-scale GWAS of complex traits, such as longitudinal
traits. Here we propose a scalable and accurate analysis framework,
SPAGRM, which controls for sample relatedness via a precise approximation of
the joint distribution of genotypes. SPAGRM can utilize GRM-free models and
thus is applicable to various trait types and statistical methods, including linear
mixed models and generalized estimation equations for longitudinal traits. A
hybrid strategy incorporating saddlepoint approximation greatly increases the
accuracy to analyze low-frequency and rare genetic variants, especially in
unbalanced phenotypic distributions. We also introduce SPAGRM(CCT) to
aggregate the results following different models via Cauchy combination test.
Extensive simulations and real data analyses demonstrated that SPAGRM maintains
well-controlled type I error rates and SPAGRM(CCT) can serve as a broadly
effective method. Applying SPAGRM to 79 longitudinal traits extracted from UK
Biobank primary care data, we identified 7,463 genetic loci, making a pioneering
attempt to conduct GWAS for these traits as longitudinal traits
Loading