Track: Track 1: Technical Foundations for a Post-AGI World
Keywords: Scalable Oversight, AI Alignment, Automated Scientific Discovery, Human-AI Collaboration, Scientific Validation, Weak-to-Strong Generalization, Process Supervision
TL;DR: We propose Recursive Oversight Decomposition (ROD), a framework for validating AI-generated scientific claims through explicit assumptions and robustness checks, shifting oversight from outputs to structured decomposition.
Abstract: As AGI evolves into a pervasive tool for automated discovery, the scientific process shifts from human generation to human validation. However, current oversight frameworks like Iterated Distillation and Amplification (IDA) are designed for verifiable tasks, not open-ended scientific claims. We propose Recursive Oversight Decomposition (ROD), a framework designed to ensure trust in post-AGI science. ROD addresses the unique failure modes of automated discovery—such as hidden assumptions and brittleness—by enforcing "Scientific Completeness" criteria explicitly. We introduce ensemble decomposition to mitigate model bias and propose experiments to test this infrastructure in formal mathematics and biology. By transferring oversight from final outputs to the decomposition of evidence, ROD provides a concrete mechanism for meaningful human-AI collaboration, ensuring that superhuman capabilities remain tethered to human-verifiable validity.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Presenter: ~Abdul_Wahid3
Format: Maybe: the presenting author will attend in person, contingent on other factors that still need to be determined (e.g., visa, funding).
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Submission Number: 44
Loading