Keywords: AI for Science, Agentic AI, Code Optimization, AutoML
TL;DR: We present TusoAI, an agentic approach to constructing scientific methods, either from scratch, or improving upon a state-of-the-art tool.
Abstract: Scientific discovery is often slowed by the manual development of computational
tools needed to analyze complex experimental data. Building such tools is costly
and time-consuming because scientists must iteratively review literature, test mod-
eling and scientific assumptions against empirical data, and implement these in-
sights into efficient software. Large language models (LLMs) have demonstrated
strong capabilities in synthesizing literature, reasoning with empirical data, and
generating domain-specific code, offering new opportunities to accelerate com-
putational method development. Existing LLM-based systems either focus on
performing scientific analyses using existing computational methods or on de-
veloping computational methods or models for general machine learning without
effectively integrating the often unstructured knowledge specific to scientific do-
mains. Here, we introduce TusoAI, an agentic AI system that takes a scientific task
description with an evaluation function and autonomously develops and optimizes
computational methods for the application. TusoAI integrates domain knowledge
into a knowledge tree representation and performs iterative, domain-specific op-
timization and model diagnosis, improving performance over a pool of candidate
solutions. We conducted comprehensive benchmark evaluations demonstrating
that TusoAI outperforms state-of-the-art expert methods, MLE agents, and scien-
tific AI agents across diverse tasks, such as single-cell RNA-seq data denoising
and satellite-based earth monitoring. Applying TusoAI to two key open problems
in genetics improved existing computational methods (40% power improvement
to scDRS in associating cells to disease in simulations and 10.5% enrichment im-
provement to pgBoost for identifying ground-truth variant-gene pairs) and uncov-
ered novel biology, including 9 new associations between autoimmune diseases
and T cell subtypes (e.g., primary biliary cirrhosis with central memory T cells)
and 7 previously unreported links between disease variants linked to their target
genes (e.g., glucose/HbA1c risk variant rs138917529 with GCK). Our code will
be publicly available upon publication.
Supplementary Material: zip
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 20069
Loading