Taming Adversarial Constraints in Constrained MDPs

Francesco Emanuele Stradi; Anna Lunghi; Matteo Castiglioni; Alberto Marchesi; Nicola Gatti

Taming Adversarial Constraints in Constrained MDPs

Francesco Emanuele Stradi, Anna Lunghi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

Published: 17 Jul 2025, Last Modified: 07 Oct 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Adversarial constraints, CMDPs, online learning

Abstract: In constrained MDPs (CMDPs) with adversarial rewards and constraints, a known impossibility result prevents any algorithm from attaining sublinear regret and constraint violation, when competing against a best-in-hindsight policy that satisfies the constraints on average. In this paper, we show how to ease such a negative result, by considering settings that generalize both stochastic CMDPs and adversarial ones. We provide algorithms whose performances smoothly degrade as the level of environment adverseness increases. Specifically, they attain $\widetilde{\mathcal{O}} (\sqrt{T} + C)$ regret and positive constraint violation under bandit feedback, where $C$ measures the adverseness of rewards and constraints. This is $C = \Theta(T)$ in the worst case, coherently with the impossibility result for adversarial CMDPs. First, we design an algorithm with the desired guarantees when $C$ is known. Then, in the case $C$ is unknown, we obtain the same results by embedding multiple instances of such an algorithm in a general meta-procedure, which suitably selects them so as to balance the trade-off between regret and constraint violation.

Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.

Serve As Reviewer: ~Anna_Lunghi1

Track: Regular Track: unpublished work

Submission Number: 10

Loading