Interpretable prediction of DNA replication origins in S. cerevisiae using attention-based motif discovery
Track: Tiny paper track (up to 4 pages)
Abstract: In a living cell, DNA replication begins at multiple genomic sites called replication origins. Identifying these origins and their underlying base sequence composition is crucial for understanding replication process. Existing machine learning methods for origin prediction often require labor-intensive feature engineering or lack interpretability. Here, we employ DNABERT to predict yeast replication origins and uncover sequence motifs by combining attention maps with MEME, a classical bioinformatics tool. Our approach eliminates manual feature extraction and identifies biologically relevant motifs across datasets of varying complexity. This work advances interpretable machine learning in genomics, offering a potentially generalizable framework for origin prediction and motif discovery.
Submission Number: 87
Loading