Beyond Static Evaluation: Building Simulation Environments for Scalable Agentic Reinforcement Learning

Akshay Arora; Ishan Nigam; Ashutosh Aggarwal; Shefali Bansal; Krishna Kumar Singh; Sweta Kumari; Nikhil Mittal; MD SHARIQ FARHAN; Siddarth Malreddy

Beyond Static Evaluation: Building Simulation Environments for Scalable Agentic Reinforcement Learning

Akshay Arora, Ishan Nigam, Ashutosh Aggarwal, Shefali Bansal, Krishna Kumar Singh, Sweta Kumari, Nikhil Mittal, MD SHARIQ FARHAN, Siddarth Malreddy

Published: 23 May 2026, Last Modified: 23 May 2026ACM CAIS 2026: RLEval Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Agentic Reinforcement Learning, Simulation Environments, Verifiable Rewards, Execution Traces, Autonomous Agent Evaluation

TL;DR: This paper introduces AgenticAI-Supervisor, a scalable simulation environment that evaluates autonomous agents using verifiable execution traces and multi-dimensional rewards to overcome the limitations of static benchmarks.

Abstract: As Large Language Models (LLMs) evolve into autonomous agents, traditional static evaluation fails to capture multi-step decision-making. We introduce AgenticAI-Supervisor, an API and UI-driven RL Gym environment that decouples environment creation from scalable execution. By moving to verifiable execution outcomes, the platform generates high-fidelity traces and applies multi-dimensional reward shaping. Critically, our framework mitigates reward hacking through rigorous internal state validation and testing. This work provides a first look at our platform's core capabilities through a Customer Support Agent case study demonstrating a consistent closed-loop feedback for model optimization. Future work will focus on advanced features such as Computer Use, Tool Use, automated "stumping", and edge-case generation.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 12

Loading