Reward Drops in Learning-based Control with an Experimental Validation on Microdrones

Nejat Tukenmez, Filippos Fotiadis, José M. Magalhães Júnior, Kyriakos G. Vamvoudakis, Seta O. Bogosyan

Published: 2024, Last Modified: 14 May 2025CDC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In this paper, we consider a computationally efficient learning-based control mechanism dealing with dense reward processing for a zero-sum game. The problem has been formulated as the online learning of the Nash equilibrium without requiring any information on the system dynamics. It has firstly been constructed as an infinite horizon optimal control problem, then as an online model-free Q-learning framework, which is composed of critic and actor networks (i.e., for the control and disturbance input). The closed-loop system is also proved to have a stable equilibrium point even in the presence of reward drops. The efficacy of the learning-based controller has been validated through simulations and experiments on micro drones.