Keywords: RNA editing, A-to-I editing, ADAR enzymes, base editors, secondary structure, RNA sequence, RNA-binding proteins, machine learning, large language models, GPT-3.5, RNA editing prediction, data augmentation, threshold adjustment, ViennaRNA, biomedical AI, generative AI, classification, RNA editing efficiency, therapeutic RNA interventions, deep learning, secondary structure prediction
TL;DR: We introduce a novel methodology that fine-tunes GPT-3.5 to predict RNA editing sites, framing the problem as both a generation and classification task, using RNA sequence and secondary structure data to enhance accuracy for therapeutic applications.
Abstract: Accurately predicting RNA editing sites is crucial for leveraging endogenous base editing technologies for therapeutic applications. This study introduces a novel methodology leveraging advanced AI techniques, specifically OpenAI's GPT-3.5, to predict both the occurrence and efficiency of RNA editing by base editors such as ADAR enzymes. By fine-tuning GPT models on extensive datasets of RNA sequences and secondary structures, we observe improvements in predictive accuracy, with our approach outperforming existing approaches. Our approach involves framing the problem in two distinct ways: as a generation problem, predicting new edited structures, and as a classification problem, determining if specific sites are edited. We also implement robust data augmentation strategies and threshold adjustments to optimize the model's performance. Our findings highlight the transformative potential of GPT in solving complex biological problems, providing a robust framework for future genetic interventions.
The sources of this work are available at our repository: https://github.com/Scientific-Computing-Lab/GPT_RNA_Editing_Detection
Submission Number: 13
Loading