everyone
since 13 Oct 2023">EveryoneRevisionsBibTeX
Policy learning for targeted coordination of massive-scale populations of, in the limit a continuum spectrum of, intelligent agents has been a missing component in reinforcement learning research. The purpose of this work is to fill in this literature gap by addressing the major challenge: the curse of dimensionality caused by the huge population size. To this end, we formulate such an intelligent agent population as a parameterized deterministic dynamical system, referred to as a group system, and then introduce the novel moment representation to the system. Under this representation, we propose a nested reinforcement learning algorithm to learn the optimal policy for the system hierarchically. As a significant advantage, each hierarchy preserves the optimality of all its lower-level children, which then leads to the fast convergence of the nested algorithm.