Learning Clinical-Trial Strategy: Offline Policy Training for Decision Agents

Published: 28 May 2026, Last Modified: 28 May 2026GenBio 2026 SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Agentic AI, decision agents, LLM, sequential decision-making, offline policy learning, clinical-trials, stratergy
TL;DR: We train decision agents offline on 881 oncology drug-program episodes assembled from 31.7k public records and find they outperform frontier tool-using LLM agents, especially on contamination-clean post-cutoff windows.
Abstract: Clinical development is sequential decision-making under uncertainty, where a sponsor must plan a portfolio of experiments from heterogeneous biomedical evidence. We study this setting by framing oncology clinical development as an offline decision-making problem in which an agent predicts the next six-month trial portfolio of an oncology drug program from information available at the decision date. To support this, we construct a temporal dataset that combines 31.7k heterogeneous public data records, including trial registries, regulatory reviews, sponsor filings, utilization data, and epidemiology, into 881 offline decision episodes across 45 historical programs. We compare behavioral cloning, reward-weighted behavioral cloning, and learned-reward training against four frontier LLM agents that share a common date-gated retrieval scaffold across held-out drug, sponsor, drug-class, and temporal splits. Adapters trained offline outperform every non-fine-tuned baseline. In the post August 2025 contamination-clean holdout, offline training reaches 39.9\% Indication F1 against 11.2\% for the strongest tool agent, suggesting that structured offline learning can teach agents to plan clinical experiments.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 89
Loading