Beyond model organisms: robust prediction of functional properties across protein evolution

Published: 24 Sept 2025, Last Modified: 26 Dec 2025NeurIPS2025-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Additional Submission Instructions: For the camera-ready version, please include the author names and affiliations, funding disclosures, and acknowledgements.
Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: genomics, disordered proteins, protein discovery, gene expression, active learning
TL;DR: Robust functional prediction requires surrogate models trained across protein evolution.
Abstract: Biological discovery and design are increasingly being guided by surrogate models trained on data from high-throughput assays in place of costly experimentation. However, existing datasets are often biased due to an overrepresentation from model organisms, leading to failures when performing evolutionary studies in non-model species. We present a hybrid framework that leverages high-throughput molecular assays and active learning to quantify biological properties across evolutionary space. We focus on transcriptional activators, which contain activation domains (ADs) that promote gene expression. ADs are intrinsically disordered and poorly conserved, which limits their study using alignment-based algorithms. Here, we develop ADhunter, a high-capacity regression model that outperforms state-of-the-art algorithms in identifying and quantifying the strength of ADs. Predictive uncertainty was used to guide evolutionary sampling across 7,842,516 proteins from 2,400 fungal genomes. We functionally characterized 9,836 ADs from 1,071 fungal genomes, providing a 15.5-fold expansion in genome representation compared to existing datasets. Comprehensive sampling improved model generalizability and provides the first functional annotation for 3,416 proteins in non-model fungi, highlighting the importance of sampling from non-model genomes to build evolutionarily robust models for predicting biological properties.
Submission Number: 313
Loading