In silico evaluation of pre-training strategies based on synthetic data for functional DNA generation
Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: DNAzymes, AI, GAN, generative models, synthetic data generation, sequence design, pre-training strategies, functional DNA sequences
TL;DR: DNAzyme design lacks data and cores. We propose synthetic data generation via MFE and property-matching pretraining, domain constraints, and hierarchical ML/structure screening with cofactors, enabling novel catalytic cores with predictable activity.
Abstract: Deoxyribozymes (DNAzymes) are catalytic single-stranded DNA molecules with a broad range of therapeutic and biotechnological applications, yet their rational design remains severely constrained. The catalytic core is typically limited to a few consensus sequences whose activity is context-dependent and unpredictable, while target-site inaccessibility and a critical lack of large, curated datasets have impeded the application of machine learning (ML) for de novo design. This work proposes a robust multi-stage computational and experimental pipeline for DNAzyme discovery, which includes: i) domain-grounded DNAzyme-centric parametric constraints for synthetic data generation, model training, and candidate sequence evaluation, ii) pre-training strategies based on two main synthetic data generation approaches, namely, pre-training on sequences with minimal free energy (MFE) boundaries matching stable structures as well as on sequences matching property distributions evaluated based on the ability to produce sequences, which are structurally similar to real DNAzymes and iii) a hierarchical screening approach based on ML models and structural priors for the effective selection of candidates for validation under laboratory conditions considering plausible cofactors crucial for catalytic activity. Overall, this pipeline provides a resource‑efficient strategy for exploring novel DNAzyme catalytic cores and enabling predictable DNAzyme activity.
Submission Number: 291
Loading