Abstract: Intelligent robots that are intended to engage with people in real life must be able to adjust to the varying tastes of their users. Robots can be taught personalized behaviors through human-robot collaboration without the need for a laborious, hand-crafted reward function. Instead, robots can learn rewards based on human styles between two robot movements that are called style-based reinforcement learning (SRL). However, existing SRL algorithms suffer from low exploration in the reward and state spaces, low feedback efficiency, and poor performance in complicated interactive tasks. We incorporate past information of the activity into SRL in order to enhance its result. In particular, we separate the activity in human-robot collaboration from the style. We employ an imprecise task reward based on task priori to guide robots in performing more efficient task exploration. Next, the robot’s policy is optimized using a learned reward from SRL to better match human styles. Additionally, reward shaping allows for the organic fusion of these two components. The outcomes of the experiment demonstrate that our approach is a workable and efficient means of achieving customized human-robot collaboration.
Loading