TRAM: Test-time Risk Adaptation with Mixture of Agents

ICLR 2026 Conference Submission13204 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, risk-aware RL, test-time adaptation, mixture of agents, occupancy measures, safety
TL;DR: TRAM aligns test-time decisions by mixing agents using a return–risk tradeoff from occupancy-based metrics, improving safety without extra training and outperforming baselines across varied risk conditions.
Abstract: Deployed reinforcement learning agents must satisfy safety requirements that emerge only at test time—evolving regulations, unexpected hazards, or shifted operational priorities. Current risk-aware methods embed fixed risk models (typically return variance) during training, but this approach suffers from two fundamental limitations: it restricts risk expressiveness to trajectory-level statistics, and it induces uniform conservatism that reduces behavioral coverage needed for effective deployment adaptation. We propose **TRAM** (Test-time Risk Adaptation via Mixture of Agents), a deployment-time framework that composes risk-neutral source policies to satisfy arbitrary risk specifications without retraining. TRAM represents risk through occupancy-based functionals that capture spatial constraints, behavioral drift, and local volatility—risk types that trajectory variance cannot encode. Our theoretical analysis provides localized performance bounds that cleanly separate reward transfer quality from risk alignment costs, and proves that risk-neutral source training is minimax optimal for deployment risk adaptation. Empirically, TRAM delivers superior safety-performance trade-offs across gridworld, continuous control, and large language model domains while maintaining computational efficiency through successor feature implementation.
Primary Area: reinforcement learning
Submission Number: 13204
Loading