Exploiting Reflectional Symmetry in Heterogeneous MORL

Exploiting Reflectional Symmetry in Heterogeneous MORL

ICLR 2026 Conference Submission19962 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: heterogeneous multi-objective reinforcement learning, reflection equivariance, reward shaping

TL;DR: We propose a novel PRISM algorithm for heterogeneous MORL, which has rigorously improved theoretical guarantees and significantly outperforms existing methods across diverse settings.

Abstract: This work studies heterogeneous Multi-Objective Reinforcement Learning (MORL), where objectives exhibit considerable discrepancies in, amongst others, sparsity and magnitude. The heterogeneity can cause dense objectives to overshadow sparse but long-term rewards, leading to sample inefficiency. To address this issue, we propose Parallel Reward Integration with reflectional Symmetry for heterogeneous MORL (PRISM), a novel algorithm that aligns reward channels and enforces reflectional symmetry as an inductive bias. We design ReSymNet, a theory-inspired model that aligns time frequency and magnitude across objectives, leveraging residual blocks to gradually learn a `scaled opportunity value' for accelerating exploration while maintaining the optimal policy. Based on the aligned reward objectives, we then propose SymReg, a reflectional equivariance regulariser to enforce reflectional symmetry in terms of agent mirroring. SymReg constrains the policy search to a reflection-equivariant subspace that is provably of reduced hypothesis complexity, thereby improving generalisability. Across MuJoCo benchmarks, PRISM consistently outperforms the baseline and oracle (with full dense rewards) in both Pareto coverage and distributional balance, achieving hypervolume gains of over 100\% against the baseline and even up to 32\% against the oracle. The code is at \url{https://anonymous.4open.science/r/reward_shaping-1CCB}.

Primary Area: reinforcement learning

Submission Number: 19962

Loading