A disease-specific language model for variant pathogenicity in cardiac and regulatory genomics

Huixin Zhan; Jason H. Moore; Zijun Zhang

A disease-specific language model for variant pathogenicity in cardiac and regulatory genomics

Huixin Zhan, Jason H. Moore, Zijun Zhang

Published: 01 Jan 2025, Last Modified: 19 May 2025Nat. Mac. Intell. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Clinical variant classification of pathogenic versus benign genetic variants remains a challenge in genetics. Current genomic foundation models have enhanced variant effect prediction (VEP) accuracy through weakly supervised or unsupervised training, yet these models lack disease specificity. Here, to address this, we propose DYNA (disease-specificity fine-tuning via a Siamese neural network), broadly applicable to all genomic foundation models for more effective VEPs in disease contexts. We applied DYNA to the coding VEP in cardiovascular diseases and the non-coding VEP of RNA splicing regulation. These two tasks cover a wide range of specific disease–gene relationships and disease-causing regulatory mechanisms; therefore, their performance will inform the general utility of DYNA. In both cases, DYNA fine-tunes various pretrained genomic foundation models on small rare-variant sets. The DYNA fine-tuned models show superior performance in held-out rare-variant test sets and are further replicated in large, clinically relevant variant annotations in ClinVar. Importantly, we observed that different genomic foundation models excel at different downstream VEP tasks, necessitating a universal tool such as DYNA to fully harness the power of genomic foundation models. Thus, DYNA offers a potent disease-specific VEP method for clinical variant interpretation. DYNA fine-tunes genomic foundation models with disease specificity using a Siamese network. It generalizes to rare-variant test sets and replicates results in ClinVar, advancing variant effect prediction for cardiovascular diseases and RNA splicing.

Loading