Learning Optimal Policies in Mean Field Models with Kullback-Leibler Regularization

Ana Busic; Sean P. Meyn; Neil Cammardella

Learning Optimal Policies in Mean Field Models with Kullback-Leibler Regularization

Ana Busic, Sean P. Meyn, Neil Cammardella

Published: 01 Jan 2023, Last Modified: 22 May 2024CDC 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The theory and application of mean field games has grown significantly since its origins less than two decades ago. This paper considers a special class in which the game is cooperative, and the cost includes a control penalty defined by Kullback-Leibler divergence, as commonly used in reinforcement learning and other fields. Its use as a control cost or regularizer is often preferred because this leads to an attractive solution. This paper considers a particular control paradigm called Kullback-Leibler Quadratic (KLQ) optimal control, and arrives at the following conclusions: 1. in application to distributed control of electric loads, a new modeling technique is introduced to obtain a simple Markov model for each load (the ‘agent’ in mean field theory). 2. It is argued that the optimality equations may be solved using Monte-Carlo techniques—a specialized version of stochastic gradient descent (SGD). 3. The use of averaging minimizes the asymptotic covariance in the SGD algorithm; the form of the optimal covariance is identified for the first time.

Loading