Time Optimal Data Harvesting in Two Dimensions through Reinforcement Learning Without Engineered Reward Functions

Shili Wu, Yancheng Zhu, Aniruddha Datta, Sean B. Andersson

Published: 01 Jan 2023, Last Modified: 04 Aug 2025ACC 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We consider the problem of harvesting data from a set of targets distributed throughout a two dimensional environment. The targets broadcast their data to an agent flying above them, and the goal is for the agent to extract all the data and move to a desired final position in minimum time. While previous work developed optimal controllers for the one-dimensional version of the problem, such methods do not extend to the 2-D setting. Therefore, we first convert the problem into a Markov Decision Process in discrete time and then apply reinforcement learning to find high performing solutions using double deep Q learning. We use a simple binary cost function that directly captures the desired goal, and we overcome the challenge of the sparse nature of these rewards by incorporating hindsight experience replay. To improve learning efficiency, we also utilize prioritized sampling of the replay buffer. We demonstrate our approach through several simulations, which show a similar performance as an existing optimal controller in the 1-D setting, and explore the effect of both the replay buffer and the prioritized sampling in the 2-D setting.