WorldFork: Trace-Auditable Forecasting Agents in Open-Ended Domains
Keywords: LLM agents, forecasting agents, agent evaluation, trace auditability, long-horizon reliability, branching rollouts, endpoint ledgers, provenance
TL;DR: WorldFork evaluates open-ended forecasting agents by pairing proper forecast scores with auditable branch traces that expose assumptions, unresolved mass, and unsupported mechanism claims.
Abstract: Open-ended forecasting agents fail not only through wrong final probabilities, but also through hidden assumptions, unsupported evidence claims, and long-horizon trace contamination. We introduce WorldFork, a trace-auditable agent workflow that turns a public event card into a multiverse of branching timelines with actors, constraints, candidate endpoints, branch lineage, path mass, endpoint ledgers, and report provenance. This forecast object lets reviewers inspect where decomposition, branch generation, timeline evolution, ledger updates, and final extraction support or undermine the reported probability. On 24 masked retrospective resolved-event cards, unconditional branching reduces WorldFork Brier score from 0.282 to 0.214, improves log score from 0.725 to 0.581, and yields lower per-card Brier loss than no-branch rollouts on 17 of 24 cases; a fixed 50/50 blend with a direct JSON forecast reaches Brier 0.205. These results are pilot evidence rather than a confirmatory benchmark: retrospective masking only partially controls leakage, the sign test is suggestive but not significant ($p=0.064$), and a card-bootstrap confidence interval includes zero. The main contribution is therefore an evaluation protocol for real-world forecasting agents that pairs proper scores with trace-level audits of assumptions, unresolved mass, and mechanism claims.
Track: Short Paper (4 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 110
Loading