Position: AI Scientists Are Not Yet Ready for Open-Ended and Fully Autonomous Scientific Discovery

ACL ARR 2026 May Submission14692 Authors

26 May 2026 (modified: 19 Jun 2026)ACL ARR 2026 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI scientists, LLM agents
Abstract: We argue that current AI scientist systems are not yet ready for open-ended and fully autonomous scientific discovery. Despite impressive capabilities in automating research workflows, these systems produce research-like artifacts rather than validated science---optimizing for surface plausibility while lacking the judgment, creativity, and real-world grounding essential to genuine discovery. Through systematic analysis and human evaluation, we identify three critical gaps: (1) the real-world environment gap---absence of infrastructure for validating AI-generated hypotheses against physical reality; (2) the professional skills gap---lack of deep domain expertise beyond general-purpose reasoning; and (3) the quality verification gap---lack of scalable mechanisms for ensuring that AI-generated scientific claims are reliable, reproducible, and scientifically verifiable. We propose corresponding directions: scaling verifiable real-world research environments, cultivating domain-specific agent skills, and developing reliability-aware frameworks. Until these fundamental gaps are bridged, AI scientists should serve as collaborative partners amplifying human capabilities, not as autonomous researchers.
Paper Type: Long
Research Area: LLM agents
Research Area Keywords: agent evaluation, grounded agents, agent memory
Languages Studied: English
EMNLP 2026 AI Reviewing Experiment: yes
Submission Number: 14692
Loading