A Two-Stage Reinforcement Learning Algorithm for AUV Path Planning Based on Trajectory Exploration and Sequence Modeling
Abstract: Path planning is essential for autonomous underwater vehicles (AUVs) to perform tasks. Many existing single-objective path planning methods rely on prior knowledge of the underwater environment. However, extracting prior knowledge is challenging because the underwater environment is influenced by ocean current, complex terrain, and other factors. This paper proposes a two-stage reinforcement learning (RL) algorithm based on trajectory exploration and sequence modeling, called the Soft Actor Critic and Online Decision Transformer (SAC-ODT), which operates without prior knowledge. This algorithm utilizes the SAC strategy exploration capability to generate training data for ODT and uses the ODT strategy optimization capability to plan a smooth, energy-efficient, and safe path. In the first stage, a Multi-Reward Strategy Embedding (MRSE) method is designed to facilitate trajectory exploration with multiple strategies, enabling further training of an ODT with comprehensive decision-making capability. During the second stage, a Condition Prioritized Buffer Update and Sampling Strategy (CP-BUSS) is proposed to enhance the sensitivity of ODT to the reward function, enabling adaptation to various tasks while accelerating high-quality path learning. Experimental results demonstrate that compared to existing RL-based benchmarks, SAC-ODT reduces the path time and energy consumption by 2.7% and 2.5%, respectively, while improving path smoothness by 91.96%. Note to Practitioners—This paper focuses on providing a new solution to the path planning problem of autonomous underwater vehicles (AUVs). Most existing path planning methods are designed for single-objective optimization or require large amounts of training data, making them unsuitable for complex underwater environments. To address these challenges, this paper proposes a two-stage AUV path planning method based on deep reinforcement learning. An additional reinforcement learning model is trained to generate training data, mitigating the issue of sparse underwater datasets. Subsequently, the generated data is utilized to train a reinforcement learning model while considering multiple objectives such as safety, energy efficiency and smoothing. Furthermore, the proposed algorithm can be fine-tuned for different environments to adapt to various mission requirements, such as seabed exploration, environmental monitoring, and pipeline inspection. In future research, we will explore strategies to handle time-varying ocean currents and dynamic unknown obstacles, as well as investigate multi-AUV coordination to achieve more efficient and collaborative missions.
External IDs:dblp:journals/tase/LiuTLXM25
Loading