TusoAI: Agentic Optimization for Scientific Methods

TusoAI: Agentic Optimization for Scientific Methods

ICLR 2026 Conference Submission20069 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AI for Science, Agentic AI, Code Optimization, AutoML

TL;DR: We present TusoAI, an agentic approach to constructing scientific methods, either from scratch, or improving upon a state-of-the-art tool.

Abstract: Scientific discovery is often slowed by the manual development of computational tools needed to analyze complex experimental data. Building such tools is costly and time-consuming because scientists must iteratively review literature, test mod- eling and scientific assumptions against empirical data, and implement these in- sights into efficient software. Large language models (LLMs) have demonstrated strong capabilities in synthesizing literature, reasoning with empirical data, and generating domain-specific code, offering new opportunities to accelerate com- putational method development. Existing LLM-based systems either focus on performing scientific analyses using existing computational methods or on de- veloping computational methods or models for general machine learning without effectively integrating the often unstructured knowledge specific to scientific do- mains. Here, we introduce TusoAI, an agentic AI system that takes a scientific task description with an evaluation function and autonomously develops and optimizes computational methods for the application. TusoAI integrates domain knowledge into a knowledge tree representation and performs iterative, domain-specific op- timization and model diagnosis, improving performance over a pool of candidate solutions. We conducted comprehensive benchmark evaluations demonstrating that TusoAI outperforms state-of-the-art expert methods, MLE agents, and scien- tific AI agents across diverse tasks, such as single-cell RNA-seq data denoising and satellite-based earth monitoring. Applying TusoAI to two key open problems in genetics improved existing computational methods (40% power improvement to scDRS in associating cells to disease in simulations and 10.5% enrichment im- provement to pgBoost for identifying ground-truth variant-gene pairs) and uncov- ered novel biology, including 9 new associations between autoimmune diseases and T cell subtypes (e.g., primary biliary cirrhosis with central memory T cells) and 7 previously unreported links between disease variants linked to their target genes (e.g., glucose/HbA1c risk variant rs138917529 with GCK). Our code will be publicly available upon publication.

Supplementary Material: zip

Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)

Submission Number: 20069

Loading