A Genomic Language Model for Zero-Shot Prediction of Promoter Indel Effects

Published: 11 Jun 2025, Last Modified: 18 Jul 2025GenBio 2025 SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: genetics, machine learning, variant effect prediction, evolutionary sequences, disease, generative models
TL;DR: A language model trained on evolutionary promoter sequences outperforms existing models in predicting the effects of indel variants in human promoter regions
Abstract: Disease-associated genetic variants occur extensively across the human genome, predominantly in noncoding regions like promoters. While crucial for understanding disease mechanisms, current methods struggle to predict effects of insertions and deletions (indels) that can disrupt gene expression. We present LOL-EVE (Language Of Life for Evolutionary Variant Effects), a conditional autoregressive transformer trained on 13.6 million mammalian promoter sequences. By leveraging evolutionary patterns and genetic context, LOL-EVE enables zero-shot prediction of indel effects in human promoters. We introduce three new benchmarks for promoter indel prediction: ultra rare variant prioritization, causal eQTL identification, and transcription factor binding site disruption analysis. LOL-EVE's dominate performance across these tasks suggests the potential of region-specific genomic language models for identifying causal non-coding variants in disease studies.
Submission Number: 113
Loading