Multi Perspective Actor Critic: Adaptive Value Decomposition for Robust and Safe Reinforcement Learning
Keywords: Reinforcement Learning, Robust Reinforcement Learning, Multi-Objective Reinforcement Learning, Value Decomposition, Safe Reinforcement Learning
Abstract: Real-world deployment of reinforcement learning requires simultaneously handling multiple objectives, safety constraints, and model uncertainty, yet existing methods address these challenges in isolation. We present Multi-Perspective Actor-Critic (MPAC), a novel framework that integrates all three aspects. MPAC combines value decomposition with component-specific risk assessment, enabling different objectives to maintain appropriate uncertainty tolerance, with collision avoidance employing extreme conservatism while efficiency permits optimistic planning. A novel influence-based mechanism dynamically adjusts component weights based on their decision relevance and learning progress, eliminating the need for fixed weights or prior reward knowledge. This yields policies that are simultaneously safe, robust to model perturbations, and less conservative than prior approaches. We prove that MPAC converges to a fixed point corresponding to a distributionally robust optimization problem with component-specific ambiguity sets, providing theoretical justification for its design. Empirically, across continuous-control benchmarks with safety constraints and perturbed dynamics, MPAC achieves superior Pareto trade-offs: it maintains high reward while matching or exceeding safety baselines. These results demonstrate that adaptively weighting decomposed objectives under uncertainty is a principled and practical path toward robust safe RL.
Primary Area: reinforcement learning
Submission Number: 9027
Loading