NeuriCo: Towards reliable AI scientists

Published: 30 May 2026, Last Modified: 30 May 2026ICML2026-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: Track 3: AI Scientist Proposal Competition
TL;DR: We propose a framework and development trajectory to truly reliable AI scientist
Abstract: To build AI systems that help with scientific research, we need to understand not just what they can do, but where they consistently fail. We present \textbf{NeuriCo}, an open-source AI co-scientist system that runs agents through a multi-stage pipeline of literature review, resource gathering, experiment execution, and analysis. Over 20 weeks of a community-driven weekly research competition, we ran 180 agent runs across 60+ research ideas in machine learning. Agents are strong at literature synthesis, data curation with smart filtering, and statistical analysis that honestly reports null results. They also show consistent failure modes that point to a deeper problem: they execute well, but they cannot judge research quality. We discuss what this means for system design, and how NeuriCo is being extended to physics, chemistry, and other domains.
Keywords: AI Scientist
Submission Number: 178
Loading