Keywords: RNA 3D structure prediction; AlphaFold 3; RNA language model
Abstract: Predicting RNA 3D structure from sequence remains challenging due to the structural flexibility of RNA molecules and the scarcity of experimentally resolved structures. We ask how self-supervised RNA language models (LMs), trained on millions of RNA sequences, can best enhance AlphaFold 3 (AF3) for RNA structure prediction. Using an open-source AF3 reproduction, we run controlled experiments that fix data and hyperparameters while varying fusion position and method. We find large performance variation: the strongest gains come from additive fusion applied at middle or late stages of the conditional network, refining AF3’s single representations with RNA LM embeddings. On RecentPDB-RNA (67 newly released structures), our best model achieves a new state of the art with an average TM-score of 0.472 (+21\% over AF3) and a 33\% success rate (TM-score $\ge$ 0.6), more than doubling AF3’s 15\%. On 11 CASP16-RNA targets, it matches the best automated system trRosettaRNA. These results show that properly fused RNA LM features substantially advance RNA 3D structure prediction. We will release the data, code, and model weights to support open science, reproducibility, and the development of automated RNA structure prediction models.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 20867
Loading