MORPHEUS: A Persistent Enterprise Benchmark for Continual RL in the Big World

Published: 10 Jun 2026, Last Modified: 10 Jun 2026RL in Big Worlds PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Continual Reinforcement Learning, Big World Hypothesis, Benchmark
TL;DR: A new benchmark for Continual Reinforcement Learning in Big Worlds.
Abstract: We introduce Morpheus, a persistent enterprise simulation platform for continual reinforcement learning (CRL) research. Existing reinforcement learning (RL) benchmarks are episodic, low-dimensional, and stationary by design, properties antithetical to the challenges of real-world deployed systems. Grounded in the Big World Hypothesis, Morpheus provides environments in which the world never resets to an initial state, objectives shift over time, and past decisions have compounding consequences. Morpheus comprises four enterprise simulation environments; we evaluate two in this paper, drawn from outbound logistics and inbound warehouse operations, each exhibiting structured non-stationarity through a parameterisable failure injection engine and an asynchronous configuration shift controller. Policies are initialised via supervised fine-tuning on API-collected trajectories and subsequently trained with PPO-based reinforcement learning. Rewards are computed from operational verifiers embedded in the platform: structured failure event signals, financial ledger status, and resource throughput. We define a formal benchmark specification, propose a six-metric evaluation protocol covering per-configuration reward, adaptation speed, forgetting, recovery time, stability, and performance gap relative to a configuration-specific theoretical upper bound, and establish baseline results across four algorithm families: standard RL, replay-based, regularisation-based, and latent context modelling.
Submission Number: 16
Loading