Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach

Anton Plaksin; Vitaly Kalev

Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach

Anton Plaksin, Vitaly Kalev

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: robust reinforcement learning, zero-sum games, deep q-learning

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Robust Reinforcement Learning (RRL) is a promising Reinforcement Learning (RL) paradigm aimed at training robust to uncertainty or disturbances models, making them more efficient for real-world applications. Following this paradigm, uncertainty or disturbances are interpreted as actions of a second adversarial agent, and thus, the problem is reduced to seeking the agents' policies robust to any opponent's actions. This paper is the first to propose considering the RRL problems within the positional differential game theory, which helps us to obtain theoretically justified intuition to develop a centralized Q-learning approach. Namely, we prove that under Isaacs's condition (sufficiently general for real-world dynamical systems), the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations, and we also indicate conditions when this Q-function can be decomposed. Based on these results, we present the Isaacs Deep Q-Networks (IDQN) and Decomposed Isaacs Deep Q-Networks (DIDQN) algorithms, respectively. We analyze their performance by comparing them with other baseline RRL and Multi-Agent RL algorithms. We consider both simple environments with known accurate solutions and complex large-dimensional MuJoCo environments. In each experiment, we thoroughly evaluate the agents' policies obtained after learning, training opponents against them using various RL algorithms with various parameters. The experiment results demonstrate the superiority of the presented algorithms in all experiments under consideration.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3129

Loading