Genome-wide nucleotide-resolution model of single-strand break site reveals species evolutionary hierarchy

Abstract: Single-strand breaks (SSBs) are the major DNA damage in the genome arising spontaneously as
the outcome of genotoxins and intermediates of DNA transactions. SSBs play a crucial role in
various biological processes and show a non-random distribution in the genome. Several SSB
detection approaches such as S1 END-seq and SSiNGLe-ILM emerged to characterize the genomic
landscape of SSB with nucleotide resolution. However, these sequencing-based methods are costly
and unfeasible for large-scale analysis of diverse species. Thus, we proposed the first computational
approach, SSBlazer, which is an explainable and scalable deep learning framework for genome-wide
nucleotide-resolution SSB site prediction. We demonstrated that SSBlazer can accurately predict
SSB sites and sufficiently alleviate false positives by constructing an imbalanced dataset to simulate
the realistic SSB distribution. The model interpretation analysis reveals that SSBlazer captures
the pattern of individual CpG in genomic context and the motif of TGCC in the center region as
critical features. Besides, SSBlazer is a lightweight model with robust cross-species generalization
ability in the cross-species evaluation, which enables the large-scale genome-wide application in
diverse species. Strikingly, the putative SSB genomic landscapes of 216 vertebrates reveal a negative
correlation between SSB frequency and evolutionary hierarchy, suggesting that the genome tends to
be integrity during evolution.
0 Replies
Loading