LLM-Driven Pareto-Optimal Multi-Mode Reinforcement Learning for Adaptive UAV Navigation in Urban Wind Environments
Abstract: Autonomous drones in complex urban wind environments must balance speed, safety, and energy efficiency under highly variable conditions. Traditional single-policy reinforcement learning controllers often perform poorly when exposed to scenarios beyond their training. We introduce a Pareto-optimal multi-mode framework that trains three specialized unmanned aerial vehicle (UAV) policies (aggressive, balanced, and cautious) via proximal policy optimization (PPO) with specific reward scalings, yielding controllers that collectively span the speed-safety-energy trade-off surface. To automate mode selection, we fine-tune a large language model (LLM) on 30,000 simulation-derived environment-performance tuples, allowing it to predict the optimal policy from building density, wind speed and orientation, battery state, and recent flight history. In a Unity-based Manhattan simulation with computational fluid dynamics (CFD) wind fields across four headings and 10 speed levels, the LLM-driven decision maker reduces average flight time by 16%, lowers the collision rate by 50%, and saves 18% energy compared to any single mode, while preserving nondominated trade-off performance. The decision maker also generalizes to unseen wind patterns and layouts without handcrafted heuristics, demonstrating the promise of combining Pareto-optimal reinforcement learning (RL) with LLM-based meta-decision making for UAV autonomy.
Loading