Custom CRISPR–Cas9 PAM variants via scalable engineering and machine learning

Rachel A. Silverstein, Nahye Kim, Ann-Sophie Kroell, Russell T. Walton, Justin Delano, Rossano M. Butcher, Martin Pacesa, Blaire K. Smith, Kathleen A. Christie, Leillani L. Ha, Ronald J. Meis, Aaron B. Clark, Aviv D. Spinner, Cicera R. Lazzarotto, Yichao Li, Azusa Matsubara, Elizabeth O. Urbina, Gary A. Dahl, Bruno E. Correia, Debora S. Marks et al. (5 additional authors not shown)

Published: 10 Jul 2025, Last Modified: 25 Jan 2026NatureEveryoneRevisionsCC BY-SA 4.0
Abstract: Engineering and characterizing proteins can be time-consuming and cumbersome, motivating the development of generalist CRISPR–Cas enzymes1–4 to enable diverse genome-editing applications. However, such enzymes have caveats such as an increased risk of off-target editing3,5,6. Here, to enable scalable reprogramming of Cas9 enzymes, we combined high-throughput protein engineering with machine learning to derive bespoke editors that are more uniquely suited to specific targets. Through structure–function-informed saturation mutagenesis and bacterial selections, we obtained nearly 1,000 engineered SpCas9 enzymes and characterized their protospacer-adjacent motif (PAM)7 requirements to train a neural network that relates amino acid sequence to PAM specificity. By utilizing the resulting PAM machine learning algorithm (PAMmla) to predict the PAMs of 64 million SpCas9 enzymes, we identified efficacious and specific enzymes that outperform evolution-based and engineered SpCas9 enzymes as nucleases and base editors in human cells while reducing off-targets. An in silico-directed evolution method enables user-directed Cas9 enzyme design, including for allele-selective targeting of the RHOP23H allele in human cells and mice. Together, PAMmla integrates machine learning and protein engineering to curate a catalogue of SpCas9 enzymes with distinct PAM requirements, motivating a shift away from generalist enzymes towards safe and efficient bespoke Cas9 variants. Combined high-throughput protein engineering with machine learning to curate libraries of CRISPR genome-editing enzymes with distinct genome targeting properties is described.
Loading