Flexible Models of Functional Annotations to Variant Effects using Accelerated Linear Algebra

Alan Nawzad Amin; Andres Potapczynski; Andrew Gordon Wilson

Flexible Models of Functional Annotations to Variant Effects using Accelerated Linear Algebra

Alan Nawzad Amin, Andres Potapczynski, Andrew Gordon Wilson

Published: 05 Mar 2025, Last Modified: 05 Mar 2025MLGenX 2025EveryoneRevisionsBibTeXCC BY 4.0

Track: Main track (up to 8 pages)

Abstract: To predict and understand the causes of disease, geneticists build models that predict how a genetic variant impacts phenotype from genomic features. There is a vast amount of data available from the large projects that have sequence hundreds of thousands of genomes; yet, state-of-the-art models, like LD score regression, cannot leverage this data as they lack flexibility due to their simplifying assumptions. These models use simplifying assumptions to avoid solving the large linear algebra problems introduced by the genomic correlation matrices. In this paper, we leverage modern fast linear algebra techniques to develop WASP (genome Wide Association Studies with Preconditioned iteration), a method to train large and flexible neural network models. On semi-synthetic and real data we show that WASP better predicts phenotype and better recovers its functional causes compared to LD score regression. Finally, we show that training larger WASP models on larger data leads to better explanations of phenotypes.

Submission Number: 11

Loading