Keywords: Multi-Agent Reinforcement Learning, Multi-scale, Applications, Wind Farms
TL;DR: The wind farm control problem can be framed as a Transition-Independent Dec-POMDP where agent dynamics are represented as a DAG: this problem structure allows us to guarantee convergence for independent Q-learning agents learning at different scales.
Abstract: Maximizing the energy production in wind farms requires mitigating wake effects, a phenomenon by which wind turbines create sub-optimal wind conditions for the turbines located downstream. Finding optimal control strategies is however challenging, as high-fidelity models predicting complex aerodynamics are not tractable for optimization. Good experimental results have been obtained by framing wind farm control as a cooperative multi-agent reinforcement learning problem. In particular, several experiments have used an independent learning approach, leading to a significant increase of power output in simulated farms. Despite empirical success, the independent learning approach has no convergence guarantee due to non-stationarity. We show that the wind farm control problem can be framed as an instance of a transition-independent Decentralized Partially Observable Decentralized Markov Decision Process (Dec-POMDP) where the interdependence of agents dynamics can be represented by a directed acyclic graph (DAG). We show that for these problems, non-stationarity can be mitigated by a multi-scale approach, and show that a multi-scale Q-learning algorithm (MQL) where agents update local Q-learning iterates at different timescales guarantees convergence.
Submission Number: 78
Loading