Track: Full / long paper (5-8 pages)
Keywords: Transcription terminator design, Sequence-to-function prediction, Surrogate-guided design, Constraint-aware generation
TL;DR: We created two open-source tools (TerSP and TerFac) to predict and generate intrinsic transcription terminators, and validated them across organisms and in both in vitro and in vivo assays.
Abstract: Intrinsic transcription terminators are central to the modularity and predictability of synthetic gene circuits. We leveraged a curated library of 582 bacterial terminators to train a predictive model and, from this surrogate, developed open source tools for terminator performance prediction and design. Each sequence was encoded by 130 sequence derived descriptors across four regions: A-tract, hairpin, loop, and U-tract. After performance based feature selection, 16 high impact attributes were retained to compare predictive models. A grid search optimized XGBoost model achieved the best average performance, exceeding the previously reported model as well as linear regression, MLPRegressor, and ensemble approaches. SHAP analysis demonstrated that U tract features indicate the importance of a more distal region than previously described and that the influence of initial hairpin GC content extends beyond the expected range. From the final model, we implemented two tools. The Terminator Strength Predictor computes features from an input sequence and returns a quantitative strength and a binary strong or weak class. Validation with experimentally characterized terminators from four bacteria, not represented in the training dataset, showed that the model reproduces relative efficiency rankings and assigns consistent classes. The Terminator Factory performs surrogate based optimization for target driven design under user defined strength and length constraints. It enabled enumeration of length specific sets of maximally strong terminators, design of synthetic terminators (TK and miniTK), and optimization of a native T7 terminator sequence. Designs were validated in vivo in Escherichia coli using the original library protocol. In addition, a new in vitro transcription assay based on fluorescent RNA aptamers (Broccoli and Mango III) was developed to further characterize the terminators. In vivo, TK exceeded the strongest reference terminator (Tmax) and miniTK showed high efficiency, while in vitro both showed high performance. These results indicate that the model captures sequence to function relationships that support prediction with TerSP and design with TerFac.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 17
Loading