Discounted Sampling Policy Gradient for Robot Multi-objective Visual Control

Meng Xu, Qingfu Zhang, Jianping Wang

2021 (modified: 09 Nov 2022)EMO 2021Readers: Everyone

Abstract: Robot visual control often involves multiple objectives such as achieving high efficiency, maintaining stability, and avoiding failure. This paper proposes a novel Vision-Based Control method (VBC) with the Discounted Sampling Policy Gradient (DSPG) and Cosine Annealing (CA) to achieve excellent multi-objective control performance. In our proposed visual control framework, a DSPG learning agent is employed to learn a policy estimating continuous kinematics for VBC. The deep policy maps the visual observation to a specific action in an end-to-end manner. The DSPG agent finally can update the policy to obtain the optimal or near-optimal solution using shaped rewards from the environment. The proposed VBC-DSPG model is optimized using a heuristic method. Experimental results demonstrate that the proposed method performs very well compared with some classical competitors in the multi-objective visual control scenario.

0 Replies