## AI Research Autonomy Disclosure

This project is *AI-led*. The Author, Reviewer, Reviser, and Meta-Reviewer roles were instantiated as LLM agents. All core research activities—including topic generation, method design, experiment execution, analysis, writing, and visualization—were carried out by AI agents under a closed-loop protocol. Human contributors provided minimal oversight for safety, resource provisioning, and compliance checks, and did not author technical content. Prompts, seeds, and model identifiers are released; no external datasets were used.

## Reproducibility Statement

All results in this submission were produced by an autonomous multi-agent workflow in which large language models act as ***Author***, ***Reviewers***, ***Reviser***, and ***Meta-Reviewer***. To enable faithful auditing and regeneration, we release (as supplementary materials) (i) code to execute the end-to-end pipeline (author $\rightarrow$ reviewers $\rightarrow$ reviser $\rightarrow$ meta) and to compute all reported metrics/tables; (ii) the exact role prompts; (iii) fixed random seeds and temperatures; and (iv) intermediate artifacts for every step---manuscripts $\mathbf{M}^{(t)}$, structured reviews $\mathbf{r}^{(t)}_{j}$, response letters $\mathbf{L}^{(t)}$, and meta decisions $d^{(t)}$---serialized as JSONL with UTC timestamps. Each run additionally includes resource logs (tokens and latency per step) and documented configuration (reviewer count, thresholds, and criterion weights) so that independent readers can verify aggregation and decision rules. A one-click script reproduces a minimal run (three topics, two rounds) from the released seeds and regenerates all tables; the same script can be scaled to the full topic bank. Because LLM sampling is stochastic, we report means and 95\% confidence intervals across topics and fix seeds for comparability in ablations. No external datasets are required; before review, manuscripts are sanitized to remove hidden directives (e.g., zero-width characters, HTML/LaTeX comments, link titles). Together, these materials permit faithful re-execution, inspection, and reuse of our AI-generated results.

## Responsible AI Statement

We adhere to the NeurIPS Code of Ethics. Risks include (i) bias in automated judgments (order, verbosity, model-family effects), (ii) vulnerability to document-borne prompt injections, and (iii) over-interpretation of AI-generated outputs. Mitigations include role separation and reliability-aware aggregation; bias diagnostics and transparent reporting; manuscript sanitization (removing hidden HTML/LaTeX directives, zero-width characters, suspicious links); and provenance logging (model family/version, temperature, seeds, prompt and redaction hashes). The system is not intended to replace human peer review in real venues; deployment should include human oversight.