Keywords: Genomics, LLM, bioinformatics, compuational biology
TL;DR: LOL-EVE: A language model trained on evolutionary promoter sequences outperforms existing models in predicting the effects of indel variants in human promoter regions.
Abstract: Genetic studies reveal extensive disease-associated variation across the human genome, predominantly in noncoding regions, such as promoters. Quantifying the impact of these variants on disease risk is crucial to our understanding of the underlying disease mechanisms and advancing personalized medicine. However, current computational methods struggle to capture variant effects, particularly those of insertions and deletions (indels), which can significantly disrupt gene expression. To address this challenge, we present LOL-EVE (Language Of Life across EVolutionary Effects), a conditional autoregressive transformer model trained on 14.6 million diverse mammalian promoter sequences. Leveraging evolutionary information and proximal genetic context, LOL-EVE predicts indel variant effects in human promoter regions. We introduce three new benchmarks for indel variant effect prediction in promoter regions, comprising the identification of causal eQTLs, prioritization of rare variants in the human population, and understanding disruptions of transcription factor binding sites. We find that LOL-EVE achieves state-of-the-art performance on these tasks, demonstrating the potential of region-specific large genomic language models and offering a powerful tool for prioritizing potentially causal non-coding variants in disease studies.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7452
Loading