Grounded-Retrieval Adversarial Imitation Loop: Integrating Language, Agent, and World Models

Published: 23 Sept 2025, Last Modified: 22 Nov 2025LAWEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, LLM agents, ReAct (reason-and-act), computational social science, world models, adversarial imitation learning (GAIL), discriminator-shaped rewards, GRPO / policy optimization
TL;DR: Introduces GRAIL, a grounded simulation loop that (1) retrieves a slate of real next actions from behavior logs, (2) lets a ReAct-style LM pick one, and (3) feeds it to a counterfactual predictor.
Abstract: We present GRAIL, a grounded simulation framework that unifies Language, Agent, and World models. GRAIL retrieves candidate real next-actions from large behavior logs, lets an LM reason-and-act (ReAct) to select among them, and learns a world model that predicts counterfactual outcomes. To support short term alignment, a GAIL-style discriminator supplies an adversarial reward that aligns agent trajectories to human occupancy measures while mitigating the homogenization artifacts of “silicon survey” prompting. On YouTube benchmarks, GRAIL improves (i) trajectory fidelity to human behavior and (ii) long-horizon stability versus Language-only and Action-only baselines. We release code, slates, and evaluation scripts to catalyze grounded, agentic world-modeling research.
Submission Type: Research Paper (4-9 Pages)
Submission Number: 98
Loading