Abstract: This paper presents an algorithm that generates distributed collision-free velocities for multi-robot while maintain formation as much as possible. The adaptive formation problem is cast as a sequential decision-making problem, which is solved using reinforcement learning that trains several distributed policies to avoid dynamic obstacles on the top of consensus velocities. We construct the policy with Bayesian Linear Regression based on a neural network (called BNL) to compute the state-action value uncertainty efficiently for sequential decision making. The information-directed sampling is applied in our BNL policy to achieve efficient exploration. By further combining the distributional reinforcement learning, we can estimate the intrinsic uncertainty of the state-action value globally and more accurately. For continuous control tasks, efficient exploration can be achieved by optimizing a policy with the sampled action value function from a BNL model. Through our experiments in some contextual Bandit and sequential decision-making tasks, we show that exploration with the BNL model has improved efficiency in both computation and training samples. By augmenting the consensus velocities with our BNL policy, experiments on Multi-Robot navigation demonstrate that adaptive formation is achieved.
0 Replies
Loading