Abstract: Multi-agent Reinforcement Learning (MARL) has drawn wide attention in recent years as a bunch of real-world complex scenes can be abstracted as multi-agent systems (MAS). Partially observable cooperative multi-agent setting, in which agents have to learn to coordinate with allies by actions conditioning on their own partial observation and share a single global reward each time-step, is the most concerned MAS by existing MARL algorithms with centralized training and decentralized executing. One key challenge is how to make effective oriented exploration. In this work, we propose a new agent network called Multi-branch Ensemble Agent Network (MEAN) to encourage the oriented exploration. We evaluate our MEAN with existing Q-learning based MARL algorithms on StarCraft II micro-management challenges. Extensive evaluations show that algorithms equipped with MEAN achieve much better performance on both homogeneous and heterogeneous scenarios compared with the initial algorithms.
Loading