Think Before Acting: The Necessity of Endowing Robot Terminals With the Ability to Fine-Tune Reinforcement Learning Policies

Dawei Feng, Xudong Gong, Xunhui Zhang, Jun Fu

Published: 01 Jan 2024, Last Modified: 14 May 2025ISPA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Goal-Conditioned Reinforcement Learning (GCRL) has gained widespread application in robotics. A typical application of GCRL is to pre-train policies in a development environment and then deploy them to robots. In this approach of application, we find that the policies trained by GCRL exhibit discontinuity in goal space, indicating that we cannot effectively estimate the policy’s performance in the actual production environment based on its evaluation results in the development environment. To ensure that the robot terminals can complete tasks effectively, we propose the Think Before Acting (TBA) framework, which evaluates and fine-tunes policies on the robot terminal. Within the TBA framework, for a goal to be executed, the policy’s performance is first evaluated. If the performance does not meet the requirements, the policy is fine-tuned based on this goal. We conduct experiments on velocity vector control of fixed-wing UAVs to validate the effectiveness of TBA. The results show that for a well-pre-trained policy, less than 105 samples and less than 2 minutes of fine-tuning time are required to achieve satisfactory performance on the target goal.