TL;DR: BAnG is a Bidirectional Anchored Generation method specifically tailored for conditioned RNA design
Abstract: Designing RNA molecules that interact with specific proteins is a critical challenge in experimental and computational biology. Existing computational approaches require a substantial amount of experimentally determined RNA sequences for each specific protein or a detailed knowledge of RNA structure, restricting their utility in practice. To address this limitation, we develop RNA-BAnG, a deep learning-based model designed to generate RNA sequences for protein interactions without these requirements. Central to our approach is a novel generative method, Bidirectional Anchored Generation (BAnG), which leverages the observation that protein-binding RNA sequences often contain functional binding motifs embedded within broader sequence contexts. We first validate our method on generic synthetic tasks involving similar localized motifs to those appearing in RNAs, demonstrating its benefits over existing generative approaches. We then evaluate our model on biological sequences, showing its effectiveness for conditional RNA sequence design given a binding protein.
Lay Summary: Designing RNA molecules that bind to specific proteins is important for both understanding biology and developing new therapies. Experimental approaches, however, are often time-consuming and costly. While AI tools offer a faster alternative, current models face significant limitations. Some of them require custom training for each protein, which requires data that is often unavailable. Others rely on having the precise 3D structure of the RNA, a detail that is rarely accessible.
To overcome these barriers, we developed a new way to generate RNA sequences by starting at the point where the RNA binds to the protein and building outward. This approach outperforms existing methods on simplified tasks and enables us to build a general model, RNA-BAnG, that works on any protein without extra training. We computationally validated its performance on real biological data and showed it can successfully design binding RNA sequences for a wide range of proteins.
By removing the need for hard-to-get data and making RNA design more flexible, RNA-BAnG offers a powerful new tool for biology and medicine. To support further research and applications, we released our model and code publicly. We hope our results will encourage experimental biologists to validate RNA-BAnG in the lab and explore new ways to use it in RNA research and medicine.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/rsklypa/RNA-BAnG
Primary Area: Deep Learning
Keywords: generative modeling, sequential data, autoregressive generation, RNA design, biological sequences, deep learning models
Submission Number: 6681
Loading