TL;DR: Q-Learning algorithm combined with attenuation oscillation curve .
Abstract: Abstract—Reinforcement Learning can be applied to many fields, but it is still a problem how to balance exploration and exploitation it reasonably in the strategy of action selection. In order to solve the problem of path optimization in unknown environment, a model based on Q-Learning algorithm is proposed to balance the exploration stage and the exploitation stage. In this paper, based on Q-Learning algorithm, attenuation oscillation curve combined with it. The algorithm can reasonably adjust the time allocation of exploration stage and exploitation stage according to the current iteration times and iteration states. Effectively avoid the problem of over- exploitation and over-exploration of algorithms. In the path optimization simulation experiment, Q-Learning algorithm based on oscillation attenuation is compared with classical Q-Learning algorithm. Experiments show that for the same environment full of obstacles, Q-Learning algorithm based on oscillation attenuation can carry out path planning with better number of steps in a shorter search time, and has the ability to jump out of the local optimal.
Submission Number: 34
Loading