regLM: Designing realistic regulatory DNA with autoregressive language models

Published: 27 Oct 2023, Last Modified: 23 Nov 2023GenBio@NeurIPS2023 PosterEveryoneRevisionsBibTeX
Keywords: Autoregressive language modeling, GPT, hyenaDNA, DNA sequence modeling, enhancer design, CRE design, generative sequence modeling
TL;DR: regLM is a new generative approach based on autoregressive language models for designing cis-regulatory DNA elements with desired properties, which has many therapeutic applications.
Abstract: Designing cis-regulatory DNA elements (CREs) with desired properties is a challenging task with many therapeutic applications. Here, we used autoregressive language models trained on yeast and human putative CREs, in conjunction with supervised sequence-to-function models, to design regulatory DNA with desired patterns of activity. Our framework, regLM, compares favorably to existing CRE design approaches at generating realistic and diverse regulatory DNA, while also providing insights into the cis-regulatory code.
Supplementary Materials: zip
Submission Number: 83
Loading