RNAscope: Benchmarking RNA Language Models for RNA Sequence Understanding

12 May 2025 (modified: 30 Oct 2025)Submitted to NeurIPS 2025 Datasets and Benchmarks TrackEveryoneRevisionsBibTeXCC BY 4.0
Keywords: RNA Language models, RNA Sequence Understanding, Benchmarks, Structure Tasks, Interaction Tasks, Function Tasks
TL;DR: We present RNAscope, a comprehensive benchmark of 1,253 experiments that systematically evaluates RNA language models across structure, interaction, and function tasks, addressing key limitations of prior work in capturing RNA biology's complexity.
Abstract: Pre-trained language models (pLMs) have advanced our understanding of RNA biology. However, current evaluation frameworks remain limited in capturing the inherent complexity of RNA, leading to insufficient and biased assessments that hinder their practical applications. Here, we introduce RNAscope, a comprehensive benchmarking framework designed to gauge RNA pLMs via structure prediction, interaction classification, and function characterization. This framework includes 1,253 experiments spanning diverse subtasks of varying complexity and enables systematic model comparison with consistent architectural modules. Model assessment shows that generalization of sequence flexibility across RNA families, target contexts, and environmental features remains challenging for existing models. RNAscope provides a systematic, robust, and fair evaluation framework to accelerate RNA modeling.
Croissant File: json
Dataset URL: https://kaggle.com/datasets/b0aeeed2f6b3dfd43c1ab33b58467a466308d056b6868d861ad00e1074ce384d
Code URL: https://anonymous.4open.science/r/RNAscope
Primary Area: AL/ML Datasets & Benchmarks for health sciences (e.g. climate, health, life sciences, physics, social sciences)
Submission Number: 2439
Loading