CRISPRLand: Interpretable large-scale inference of DNA repair landscape based on a spectral approach

Abstract: We propose a new spectral framework for reliable training, scalable inference and interpretable explanation of the DNA repair outcome following a Cas9 cutting. Our framework, dubbed CRISPRL AND, relies on an unexploited observation about the nature of the repair process: the landscape of the DNA repair is highly sparse in the (Walsh–Hadamard) spectral domain. This observation enables our framework to address key shortcomings that limit the interpretability and scaling of current deep-learning-based DNA repair models. In particular, CRISPRLand reduces the time to compute the full DNA repair landscape from a striking 5230 years to 1 week and the sampling complexity from 10^12 to 3 million guide RNAs with only a small loss in accuracy (R2∼0.9). Our proposed framework is based on a divide-and-conquer strategy that uses a fast peeling algorithm to learn the DNA repair models. CRISPRL AND captures lower-degree features around the cut site, which enrich for short insertions and deletions as well as higher-degree microhomology patterns that enrich for longer deletions.
0 Replies
Loading