No Clear Winner at Small Scale: Comparing Modern Sequence Architectures and Training Strategies for Genomic Language Models

Vera Milovanović; Antonio Orvieto

No Clear Winner at Small Scale: Comparing Modern Sequence Architectures and Training Strategies for Genomic Language Models

Vera Milovanović, Antonio Orvieto

Published: 11 Jun 2025, Last Modified: 18 Jul 2025GenBio 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: genomic language models, genomics, Mamba, Attention, sequence models

Abstract: Pretrained large language models based on a variety of sequence modeling architectures (e.g. Transformers, Mamba, Hyena) are increasingly being applied beyond natural language processing (NLP). In genomics, they have shown potential to reveal intricate structures and dependencies within DNA sequences, particularly within non-coding regions. To guide a principled development of training methods and architectures in the genomics domain, in this work we examine the most common classes of sequence modeling architectures found in language models and further explore transfer-learning paradigms such as pretraining on large-scale external datasets as well as self pretraining (on the same data, using a reconstruction loss). In contrast to recent works, focusing specifically on finetuning large transformers, show that most recent recurrent models (Mamba) and implicit convolution based models (Hyena), that are increasingly used for genomic language models, do not offer an advantage over attention based Transformer models. To enable thorough and controlled comparisons, we adopt a fixed training pipeline and limit our experiments to relatively small-scale model -- an approach that still aligns well with the performance trends observed in recent studies.

Submission Number: 158

Loading