Abstract: Recent progress in DNA language models has been increasingly driven by large and complex systems, which can obscure the impact of improvements to standard NLP architectures. In this work, we study whether and how a modernized BERT-style backbone (ModernBERT) can be adapted to genomic sequence modeling to improve computational efficiency, training stability, and long-context handling. Under controlled experimental settings, we benchmark efficiency across a range of sequence lengths and evaluate downstream performance on the Nucleotide Transformer benchmark. The resulting model, ModernGENA, achieves a strong efficiency–quality trade-off and ranks among the top-performing models in our evaluation suite. To support reproducibility and to provide a solid default reference point for future architectural work in genomics, we release the full implementation and configuration of ModernGENA as an open, reusable baseline.
Track: Main track
Keywords: genomic language models, DNA foundation models, ModernGENA, ModernBERT, Transformer encoder, masked language modeling, long-context genomics, inference efficiency, FlashAttention, BPE tokenization, Nucleotide Transformer benchmark, genomics representation learning, foundation models
TLDR: ModernGENA shows that going “back to BERT” with modern engineering yields a strong accuracy–speed trade-off baseline for DNA foundation models.
AI Policy Confirmation: I confirm that this submission clearly discloses the role of AI systems and human contributors and complies with the ICLR 2026 Policies on Large Language Model Usage and the ICLR Code of Ethics.
Submission Number: 97
Loading