An AI Scientist that Doesn't Drift: Taste, Structure, and Falsifiable Findings in a Quadruped Navigation Research Loop
Track: Track 1: Original Research/Position/Education/Attention Track
Keywords: An AI Scientist Guided by Encoded Human Research Preferences: A Case Study in Quadruped Neural Navigation
TL;DR: AI Scientists drift toward leaderboard tuning unless their loop has a structural place for taste; we operationalize taste as a preference oracle and produce falsifiable findings on quadruped navigation.
Abstract: Autonomous research loops driven by large language models can run machine-learning experiments at scale but tend to drift toward local refinements of whichever metric they optimise rather than testing the hypotheses that motivate the experiments. We address this structurally and present an AI Scientist for studying generalisation in quadruped robot navigation policies in simulation. Building on the autoresearch paradigm of Karpathy (2026), our loop adds three components: an immutable experiment card that pairs each iteration’s prediction with its outcome under a fixed schema, so a falsified hypothesis cannot be retconned; specialised subagents restricted to mechanical roles; and kkanbu, a preference oracle that holds the user’s research taste as a typed knowledge graph and is the only component permitted to make subjective judgements. Across multiple experiment batches, the system produced mechanism-level findings that redirected the project, including identifying a previously-overlooked bottleneck, recovering an earlier-retired paradigm, and falsifying one of its own hypotheses.
Submission Number: 247
Loading