NC-Bench and NCfold: A Benchmark and Closed-Loop Framework for RNA Non-Canonical Base-Pair Prediction
Keywords: RNA secondry structure prediction, RNA non-canonical base pair, RNA foundation model
TL;DR: We introduce NC-Bench, the first benchmark for RNA non-canonical base-pair prediction, and NCfold, a novel framework integrating structural priors from RNA foundation models, establishing a systematic foundation for advancing RNA structure modeling.
Abstract: RNA secondary structure forms the basis for folding and function, with non-canonical (NC) interactions indispensable for catalysis, regulation, and molecular recognition.
Despite their importance, predicting NC base pairs remains challenging due to the absence of a standardized benchmark for systematic evaluation.
To address this, we introduce NC-Bench, the first benchmark dedicated to NC base-pair prediction. NC-Bench provides 925 curated RNA sequences with 6,708 high-quality NC annotations, fine-grained edge and orientation classification tasks, and IsoScore-based embedding evaluation, offering a rigorous foundation for systematic assessment.
Building on this, we propose NCfold, a dual-branch framework that couples sequence features with structural priors derived from RNA foundation models (RFMs) via Representative Embedding Fusion (REF) and REF-weighted self-attention.
The closed-loop design iteratively refines sequence and structure representations, alleviating data sparsity and enhancing predictive accuracy.
Experiments on NC-Bench show that NCfold outperforms existing methods, with zero-shot and ablation studies confirming its effectiveness and underscoring the need for NC-specific benchmarks.
Together, NC-Bench and NCfold establish a systematic foundation for NC base-pair prediction, advancing our understanding of RNA structure and enabling next-generation RNA-centric applications. The datasets and codes are publicly available at https://github.com/heqin-zhu/NCBench.
Primary Area: datasets and benchmarks
Submission Number: 724
Loading