Motif-Gen: Learning the Compositional Logic of Gene Regulation for De Novo DNA Design

Published: 02 Mar 2026, Last Modified: 02 Mar 2026Gen² 2026 PosterEveryoneRevisionsCC BY 4.0
Track: Tiny / short paper (2-4 pages)
Keywords: Motif Generation, Regulatory Genomics
TL;DR: We introduce MotifGen, a framework that explicitly enforces regulatory consistency, reducing motif co-occurrence error and significantly improving the mechanistic validity and diversity of designed sequences.
Abstract: Generative models for regulatory DNA have shown promise in designing sequences with targeted transcriptional activities. However, existing state-of-the-art methods, such as Discrete Diffusion (D3), often struggle to capture the complex, higher-order syntax of gene regulation, specifically the precise co-occurrence rules and spatial arrangements of transcription factor binding sites. In this work, we introduce MotifGen, a novel generative framework designed to bridge the gap between predicted functional activity and biologically valid regulatory grammar. We benchmark MotifGen against leading diffusion baselines on human promoter and context-specific fly enhancer tasks. Our evaluations demonstrate that MotifGen significantly outperforms previous methods in reproducing natural motif distributions. These results indicate that MotifGen does not merely optimize for scalar activity labels but effectively learns the underlying cis-regulatory code, offering a robust solution for the interpretable design of synthetic regulatory elements.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 16
Loading