Holistic protean block for long-range DNA sequence modeling

03 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: holistic protean block; DNA sequence modeling; foundation model; biological analysis
Abstract: Modeling DNA sequences, which are defined by a complex interplay of local motifs, long-range dependencies, and periodic patterns, is a fundamental challenge in computational biology. Existing foundation models based on CNNs, Transformers, and SSMs are constrained by static or time-domain-only signal processing operations, which limit the architectural flexibility and multi-domain perspective needed to fully capture the diverse features of DNA. Here, we introduce the **Holistic Protean Block (HPB)**, a novel scalable architecture that achieves multi-level plasticity through three synergistic layers. Its Locus Plasticity Layer (LPL) provides token-level plasticity by employing token-specific convolution operations, allowing it to precisely model fine-grained, local patterns. Its Domain Plasticity Layer (DPL) establishes perspective-level plasticity by concurrently modeling both sequential (time) and spectral (frequency) features, enabling it to form multi-domain, global representations. Its Saliency Plasticity Layer (SPL) realizes information flow plasticity by learning saliency scores along dual axes, thereby permitting it to focus information flow on the most critical features. These layers work in tandem, extracting a holistic representation of diverse genomic patterns by adaptively reshaping their computational strategy. The DNA model constructed with HPB (**HPB-DNA**) not only achieves state-of-the-art performance on various genomic benchmarks with a quasi-linear complexity, but is also validated by in-depth model analyses, which collectively establish the HPB as a more powerful and principled paradigm for DNA sequence modeling. Code will be available upon acceptance.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 1587
Loading