Distributed Hybrid Kalman Temporal Differences for Reinforcement Learning

Mohammad Salimibeni, Parvin Malekzadeh, Arash Mohammadi, Konstantinos N. Plataniotis

Published: 2020, Last Modified: 06 Nov 2023ACSSC 2020Readers: Everyone

Abstract: The paper focuses on development of model-free and distributed Reinforcement Learning (RL) algorithms for multi-agent networks. The goal is to learn optimal control policies directly from smart agents’ cooperative interactions among themselves and with the environment. In model-free RL methods with continuous state-space, typically, the value function needs to be approximated. In this regard, Deep Neural Networks (DNNs) provide an attractive modeling mechanism to approximate the value function using sample transitions. Direct utilization of DNN-based single-agent approaches, however, failed to fully overcome the complexities of the multi-agent scenarios. In different multi-agent cooperative scenarios, Kalman-based methodologies could be used as an efficient alternative. Such an approach, however, commonly requires a-priori information about the system (such as noise statistics) to perform efficiently. To address the aforementioned challenge, the paper proposes a Distributed Hybrid (multiple model) Kalman Temporal Difference framework (DH-KTD). The proposed DH-KT framework adapts the parameters of the localized filters in a distributed fashion using the observed states and rewards in an optimized fashion. Experimental results based on a multi-agent benchmark RL problem illustrate efficacy of the proposed framework.

0 Replies