Keywords: LLM Agents, Simulation Environment, Dynamic Environment
TL;DR: RAISE is a simulation-first framework that builds realistic enterprise environments to safely train, evaluate, and optimize AI agents using prompts, supervised fine-tuning, and reinforcement learning.
Abstract: AI agents hold great value for enterprises but are notoriously difficult to productionize in a robust and reliable way. Enterprise agents must internalize task context, constraints, and success metrics to be reliable, yet learning directly in production is risky and often infeasible. We present RAISE, a simulation-first experiential learning framework for training and evaluating domain-specific AI agents through simulated experience. RAISE constructs high-fidelity interactive environments that mirror target deployments, including tool APIs, data schemas, user behaviors, and organizational policies. The system generates executable tool-calling and user-simulation traces and logs replayable trajectories. A hybrid evaluation layer provides dense, verifiable signals from task-specific checkers and rubric-driven LLM-as-a-judge assessments. The framework is optimizer-agnostic and supports multiple post-training paths, including supervised fine-tuning on simulated transcripts, reinforcement learning with online rollouts and trajectory replay, and iterative prompt or policy optimization.
Archival Option: The authors of this submission want it to appear in the archival proceedings.
Submission Number: 121
Loading