PatchDNA: A Flexible and Biologically-Informed Alternative to Tokenization for DNA

Published: 23 Sept 2025, Last Modified: 26 Sept 2025AI4D3 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: DNA, DNA language model, gLM, tokenization, genomic sequence representation
TL;DR: Evolutionary conservation–guided “patch” boundaries focus model capacity on the most functionally important regions, yielding smaller models that nonetheless outperform current state-of-the-art benchmarks and, uniquely, permit on-the-fly re-patching
Submission Number: 7
Loading