Track: Position paper track
Published Or Accepted: false
Keywords: Reinforcement Learning, Rashomon Effect, Behavioral Diversity, Interpretability, Alignment
TL;DR: This paper extends the Rashomon Effect to reinforcement learning, formalizing how distinct policies can achieve similar performance and showing that analyzing this inherent multiplicity can enhance robustness, interpretability, and alignment in RL.
Abstract: This paper extends the Rashomon Effect to Reinforcement Learning (RL), leveraging it as a framework for analyzing behaviorally diverse yet performance-equivalent policies.
We begin by formalizing analogies: between datasets and environments, between losses and rewards.
These analogies let us define the \emph{Rashomon set of RL agents} as the set of policies that achieve comparable returns while differing in behavior.
This framing highlights multiplicity as an inherent property of learning rather than stochastic noise, with implications for alignment, interpretability, and retraining.
We further extend the concept to multi-criteria settings, showing how multiple overlapping equivalence criteria reveal structured diversity within policy spaces.
Viewing RL through the Rashomon lens encourages systematic study of behavioral multiplicity as a foundation for more robust, interpretable, and human-aligned agents.
Submission Number: 11
Loading