\section*{Agents4Science AI Checklist}

\textbf{Disclosure.} We have disclosed all uses of LLMs for ideation, editing, and code assistance (see AI Contribution Disclosure).

\textbf{Human oversight.} All LLM outputs were reviewed and edited by the authors; scientific choices (methods, hyperparameters, evaluations) were made by the authors.

\textbf{Safety review.} We considered misuse risks from acquisition policies and included calibration analysis and ablations to reduce over‑confidence.

\textbf{Reproducibility.} Deterministic hashing is used; seeds are fixed; a \texttt{make reproduce} target runs the full pipeline to regenerate metrics and plots.

\textbf{Data governance.} Only synthetic data is used; dataset generation and license are documented in the data card; no PII is present.

\textbf{Release plan.} We release code, configuration, and figures sufficient to reproduce the paper; weights are unnecessary as models are lightweight baselines.
