Keywords: Reinforcement Learning, POMDP, Navigation, Fluid Mechanics, Turbulence.
Abstract: We consider the problem of navigating in a fluid flow while being carried by it,
using only information accessible from on-board sensors. This POMDP (partially
observable Markov decision process) is particularly challenging because to behave
optimally, the agent has to exploit coherent structures that exist in the flow without
observing them directly, and while being subjected to chaotic dynamics. It is yet
commonly faced by autonomous robots deployed in the oceans and drifting with the
flow (for, e.g., environmental monitoring). While some attempts have been made
to use reinforcement learning for navigation in partially observable flows, progress
has been limited by the lack of well-defined benchmarks and baselines for this
application. In this paper, we first introduce a well-posed navigation POMDP for
which a near-optimal policy is known analytically, thereby allowing for a critical
assessment of reinforcement learning methods applied to autonomous navigation
in complex flows. We then evaluate the ’vanilla’ learning algorithms commonly
used in the fluid mechanics community (Advantage Actor Critic, Q-Learning)
and report on their poor performance. Finally, we provide an implementation of
PPO (Proximal Policy Optimization) able to match the theoretical near-optimal
performance. This demonstrates the feasibility of learning autonomous navigation
strategies in complex flows as encountered in the oceans.
Supplementary Material: zip
Submission Number: 96
Loading