G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration

Samuel Holt; Max Ruiz Luyten; Antonin Berthon; Mihaela van der Schaar

G-Sim: Generative Simulations with Large Language Models and Gradient-Free Calibration

Samuel Holt, Max Ruiz Luyten, Antonin Berthon, Mihaela van der Schaar

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We build trustworthy, intervenable simulators by combining a Large Language Model's ability to propose a system's structure with flexible, gradient-free calibration against real-world data.

Abstract: Constructing robust simulators is essential for asking "what if?" questions and guiding policy in critical domains like healthcare and logistics. However, existing methods often struggle, either failing to generalize beyond historical data or, when using Large Language Models (LLMs), suffering from inaccuracies and poor empirical alignment. We introduce **G-Sim**, a hybrid framework that automates simulator construction by synergizing LLM-driven structural design with rigorous empirical calibration. G-Sim employs an LLM in an iterative loop to propose and refine a simulator's core components and causal relationships, guided by domain knowledge. This structure is then grounded in reality by estimating its parameters using flexible calibration techniques. Specifically, G-Sim can leverage methods that are both **likelihood-free** and **gradient-free** with respect to the simulator, such as **gradient-free optimization** for direct parameter estimation or **simulation-based inference** for obtaining a posterior distribution over parameters. This allows it to handle non-differentiable and stochastic simulators. By integrating domain priors with empirical evidence, G-Sim produces reliable, causally-informed simulators, mitigating data-inefficiency and enabling robust system-level interventions for complex decision-making.

Lay Summary: Making smart decisions for complex systems, like managing hospital capacity or a company's supply chain, often requires asking "what if...?" questions. We rely on computer simulations to explore different scenarios, but building accurate ones is a major challenge. Current methods often fail in two ways: some are stuck in the past, unable to predict new situations they haven't seen in the data, while others that use creative AI like Large Language Models (LLMs) can be unreliable and invent details that don't match reality. Our work, G-Sim, introduces a hybrid approach that gets the best of both worlds. First, we use an LLM's vast knowledge to sketch a basic blueprint of the system, outlining its main parts and how they connect. Then, in a crucial second step, we use flexible algorithms to automatically tune this blueprint, adjusting all its specific numbers until the simulation's behavior accurately matches real-world data. If the simulation is still flawed, G-Sim identifies the problem and asks the LLM to propose a better blueprint, repeating the process until the model is right. This method results in trustworthy digital replicas of complex systems. It empowers decision-makers—from hospital administrators to city planners—to safely and reliably test the consequences of their choices before implementing them in the real world, leading to better, safer, and more informed strategies.

Link To Code: https://github.com/samholt/generative-simulations

Primary Area: General Machine Learning->Everything Else

Keywords: Large Language Models, Generative Simulation, World Models, Simulation-Based Inference, Gradient-Free Optimization, Likelihood-Free Inference, Hybrid Models, Causal Inference

Flagged For Ethics Review: true

Submission Number: 7465

Loading