Applying Reinforcement Learning to Navigation In Partially Observable Flows

Selim Mecanna; Aurore Loisy; Christophe Eloy

Applying Reinforcement Learning to Navigation In Partially Observable Flows

Selim Mecanna, Aurore Loisy, Christophe Eloy

Published: 01 Aug 2024, Last Modified: 09 Oct 2024EWRL17EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, POMDP, Navigation, Fluid Mechanics, Turbulence.

Abstract: We consider the problem of navigating in a fluid flow while being carried by it, using only information accessible from on-board sensors. This POMDP (partially observable Markov decision process) is particularly challenging because to behave optimally, the agent has to exploit coherent structures that exist in the flow without observing them directly, and while being subjected to chaotic dynamics. It is yet commonly faced by autonomous robots deployed in the oceans and drifting with the flow (for, e.g., environmental monitoring). While some attempts have been made to use reinforcement learning for navigation in partially observable flows, progress has been limited by the lack of well-defined benchmarks and baselines for this application. In this paper, we first introduce a well-posed navigation POMDP for which a near-optimal policy is known analytically, thereby allowing for a critical assessment of reinforcement learning methods applied to autonomous navigation in complex flows. We then evaluate the ’vanilla’ learning algorithms commonly used in the fluid mechanics community (Advantage Actor Critic, Q-Learning) and report on their poor performance. Finally, we provide an implementation of PPO (Proximal Policy Optimization) able to match the theoretical near-optimal performance. This demonstrates the feasibility of learning autonomous navigation strategies in complex flows as encountered in the oceans.

Supplementary Material: zip

Submission Number: 96

Loading