Keywords: superoptimization, large language model agent, scientific analysis workflow
Abstract: Data-driven scientific discovery relies on complex computational workflows to process large, high-dimensional experimental datasets. However, a fundamental bottleneck exists: adapting carefully engineered computational tools to bespoke datasets studied by individual labs demands substantial manual tuning and custom code development, consuming weeks or months of expert time and slowing scientific progress. To address this bottleneck, we introduce agentic superoptimization, a new paradigm for leveraging generative AI to autonomously write customized code that can surpass human-expert-engineered solutions. We present SciOpt, a proof-of-concept agentic framework for superoptimizing data preparation functions directly in real-world, production-level scientific workflows, without requiring additional annotations or training. We validate our approach on challenging biology and medical imaging tasks, showing that our agent consistently outperforms expert baselines. Notably, our agent-generated code achieved was successfully deployed into a production-level scientific pipeline. Our work lays the foundation towards human-AI agent collaborative discovery in complex, real-world environments.
Submission Number: 30
Loading