DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively

ICLR 2026 Conference Submission911 Authors

Published: 26 Jan 2026, Last Modified: 26 Jan 2026ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Automated Scientific Discovery, Large Language Models (LLMs), AI Scientist
TL;DR: This is the first empirical demonstration of an AI that acts as an autonomous scientist to progressively push research frontiers, successfully discovering novel methods that outperform the human SOTA across multiple domains.
Abstract: While previous AI Scientist systems can generate novel findings, they often lack the focus to produce scientifically valuable contributions that address pressing human-defined challenges. We introduce DeepScientist, a system designed to overcome this by conducting goal-oriented, fully autonomous scientific discovery over month-long timelines. It formalizes discovery as a Bayesian Optimization problem, using a cumulative Findings Memory to intelligently balance the exploitation of promising avenues with the exploration of novel hypotheses. Consuming over 20,000 GPU hours, the system generated about 5,000 unique ideas and experimentally validated approximately 1100, ultimately surpassing human-designed 2025 state-of-the-art (SOTA) methods on three frontier AI tasks by 183.7\%, 1.9\%, and 7.9\%. Crucially, this was achieved by autonomously redesigning core methodologies, not merely recombining existing techniques. In a striking demonstration, the system achieved progress on AI text detection in just two weeks that is comparable to three years of cumulative human research. This work provides the first large-scale evidence of an AI achieving discoveries that progressively surpass human SOTA on scientific tasks, producing valuable findings that genuinely push the frontier forward. To facilitate further research into this process, we will open-source all experimental logs and system code.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 911
Loading