Hierarchical Assembly of Long DNA Libraries from Short Oligonucleotide Pools

Published: 05 Mar 2025, Last Modified: 07 May 2025MLGenX 2025EveryoneRevisionsBibTeXCC BY 4.0
Track: Main track (up to 8 pages)
Abstract: Large-scale screening and high-throughput experimental data generation are essential for advancing AI-driven genomics research. However, these processes are generally constrained by the length limitation of chip-synthesized oligo-pools ($<300$ bp). In addition, synthesizing gene-sized DNA sequences at scale remains economically unfeasible, making it difficult to validate the experimental performance of certain machine learning models or to generate new datasets for further training. To address this challenge, we developed a novel method for the high-throughput assembly of gene-sized DNA sequences, starting from cost-effective chip-synthesized oligo-pools. In contrast to Polymerase Cycling Assembly (PCA), we employed Golden Gate Assembly (GGA) to facilitate the ligation of short DNA fragments. This approach enabled us to successfully assemble high-quality DNA libraries containing up to 96 gene-sized sequences (600 bp) in a single-pot reaction, with convenient retrieval of individual sequences. If numerous reactions are conducted in parallel---for example, in a 96-well plate---we can readily assemble up to 9,216 (96 x 96) genes. When combined with advances in automation technologies, this enables the efficient and cost-effective synthesis of gene-sized DNA sequences at scale, thereby accelerating the generation of experimental data for the Machine Learning community.
Submission Number: 74
Loading