- Keywords: adversarial reinforcement learning, mean-field optimal control, generalization
- Abstract: Adversarial reinforcement learning has been shown promising in solving games in adversarial environments, while the theoretical understanding is still premature. This paper theoretically analyses the convergence and generalization of adversarial reinforcement learning under the mean-field optimal control framework. A new mean-field Pontryagin's maximum principle is proposed for reinforcement learning with implicit terminal constraints. Applying Hamilton-Jacobi-Issacs equation and mean-field two-sided extremism principle (TSEP), adversarial reinforcement learning is modeled as a mean-field quantitative differential game between two constrained dynamical systems. These results provide the necessary conditions for the convergence of the global solution to the mean-field TSEP. The global solution is also unique when the terminal time is sufficiently small. Moreover, two generalization bounds are delivered via Hoeffding's inequality and algorithmic stability. Both bounds do not explicitly depend on the dimensions, norms, or other capacity measures of the parameter, which are usually prohibitively large in deep learning. The bounds help characterize how the algorithm randomness facilitates the generalization of adversarial reinforcement learning. Moreover, the techniques may be helpful in modeling other adversarial learning algorithms.
- One-sentence Summary: This paper theoretically analyses the convergence and generalization of adversarial reinforcement learning under the mean-field optimal control framework.