Track: Track 3: AI Scientist Proposal Competition
Abstract: Autonomous "AI scientists" built on large language models have largely been designed for rapid, single-session experimental loops with low operational risk. Computational physical science breaks this paradigm: molecular simulations run for hours in shared HPC queues, incorrect assumptions can waste days of compute, and realistic workflows span multiple interactive sessions and human handoffs. In this high-latency, high-stakes regime, today's AI scientists optimize for clever ideation rather than durable, auditable execution across wall-clock time.
We present Superscientist, an agentic harness that makes durability a first-class property of AI-driven research. Superscientist instantiates a durability contract between humans and agents through three integrated pillars. Specify: before any computation begins, a structured Socratic dialogue elicits experimental parameters, checks method applicability, and agrees on success criteria, producing a machine-verifiable execution contract that makes the agent's responsibilities explicit. Persist: the entire workflow is continuously materialized on disk as JSON state, progress logs, and bootstrap scripts, so that fresh sessions and subagents can reconstruct full context without hidden inheritance, supporting reproducibility, auditing, and cross-session attribution. Dispatch: clean-context subagents execute plan stages on heterogeneous compute backends (local, Slurm, PBS, LSF), with advancement strictly gated by verification against the initial specification, enabling mid-campaign methodological corrections without loss of traceability.
We demonstrate Superscientist on three end-to-end molecular dynamics campaigns: polymer-melt viscosity via Green–Kubo, thermal conductivity of 3C-SiC, and Bayesian optimization of sequence-specific polymers for materials design. Across all three we observe robust cross-session resumption, recovery from failures, and principled intervention when methods require revision. These case studies argue that the durability contract is a practical benchmark for autonomy, accountability, and co-authorship in AI-driven computational science — and that moving from "agents that propose experiments" to "agents that uphold an explicit, inspectable contract over time" is a necessary step toward deploying AI scientists as reliable collaborators in high-performance computational labs.
Keywords: agentic harness; autonomous scientific workflows; durability and reproducibility; human–AI co-authorship; molecular dynamics; high-performance computing; computational materials science
Submission Number: 85
Loading