Newton-based Policy Search for Networked Multi-agent Reinforcement Learning

Giorgio Manganini, Simone Fioravanti, Giorgia Ramponi

Published: 01 Jan 2022, Last Modified: 13 May 2023CDC 2022Readers: Everyone

Abstract: Newton’s method is a standard optimization algorithm, characterized by a fast rate of convergence and used in many popular approximated methods that use second-order information. Despite its well understood theoretical properties, quadratic convergence rate and extended applications, Newton’s method is seldom used for policy optimization in Multi-Agent Reinforcement Learning problems. In this work we investigate a distributed Newton consensus scheme for performing policy search in a networked cooperative environment, where the agents are endowed with private local rewards, though they aim to collaborate for maximizing the network-wise averaged long-term return. In the proposed algorithm, the agents seek for the parameters of the optimal global policy by locally computing an approximated Newton’s direction for the global objective function, and sequentially update it in a distributed fashion by means of an average consensus procedure. The strategy is purely policy-based and does not involve any representation of the global value-function. We analyse the computational and theoretical properties of the algorithm and prove, under suitable assumptions, global converge to the true maximizer. Additionally, we provide convergence guarantees also under finite-sample conditions. Beside the theoretical properties, we perform numerical experiments showing the validity of the approach and highlight its improved convergence speed when compared to a simpler first-order distributed method.

0 Replies