Mitigating Constraint Conflict in Offline RL: An Adaptive Weighted Constraint Approach

Published: 19 Dec 2025, Last Modified: 05 Jan 2026AAMAS 2026 ExtendedAbstractEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Offline Reinforcement Learning, Constraint Conflict, Policy Regularization, Geometric Median
Abstract: Offline Reinforcement Learning (RL) allows agents to learn effective policies from pre-collected datasets by imposing conservative constraints to address the out-of-distribution problem. However, existing methods face a critical challenge of constraint conflict with datasets generated by multiple behavior policies, and these diverse policies suggest conflicting actions that lead the agent towards suboptimal performance. To address this constraint conflict, geometric distance-based and advantage-weighted methods can be employed. These methods exhibit several limitations, including sensitivity to low-quality data, high computational cost, and a tendency to learn overly conservative policies. To overcome these limitations, we propose a novel method, Adaptive Weighted Constraint (AWC). Our approach mitigates constraint conflicts by training a constraint network via adaptive weighted behavior cloning. The key idea is to dynamically assign importance weights to dataset actions based on their consistency with the current policy, ensuring that the constraint is informed by the behavior and its distance to the policy. Inspired by the robustness of central tendency estimators in statistics, we apply the weighted geometric median of the actions as a stable target for policy constraint. Theoretically, we prove that the constraint network converges to a stable regression target, and the learned policy enjoys a bounded suboptimality guarantee. Experiments on D4RL benchmarks demonstrate that our approach achieves state-of-the-art performance.
Area: Learning and Adaptation (LEARN)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 1623
Loading