Faster algorithm and sharper analysis for constrained Markov decision process

Published: 2024, Last Modified: 16 May 2025Oper. Res. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The problem of constrained Markov decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated reward subject to constraints on its utilities/costs. We propose a new primal-dual approach with a novel integration of entropy regularization and Nesterov's accelerated gradient method. The proposed approach is shown to converge to the global optimum with a complexity of O˜(1/ϵ)<math><mover accent="true" is="true"><mrow is="true"><mi mathvariant="script" is="true">O</mi></mrow><mrow is="true"><mo stretchy="false" is="true">˜</mo></mrow></mover><mo stretchy="false" is="true">(</mo><mn is="true">1</mn><mo stretchy="false" is="true">/</mo><mi is="true">ϵ</mi><mo stretchy="false" is="true">)</mo></math> in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approaches by a factor of O(1/ϵ)<math><mi mathvariant="script" is="true">O</mi><mo stretchy="false" is="true">(</mo><mn is="true">1</mn><mo stretchy="false" is="true">/</mo><mi is="true">ϵ</mi><mo stretchy="false" is="true">)</mo></math>.
Loading