- TL;DR: We propose learning of the collective policy solely in simulation so that agents' biases (analoguous to human's cognitive biases) are complemented by one another.
- Abstract: We consider a setting where biases are involved when agents internalise an environment. Agents have different biases, all of which resulting in imperfect evidence collected for taking optimal actions. Throughout the interactions, each agent asynchronously internalises their own predictive model of the environment and forms a virtual simulation within which the agent plays trials of the episodes in entirety. In this research, we focus on developing a collective policy trained solely inside agents' simulations, which can then be transferred to the real-world environment. The key idea is to let agents imagine together; make them take turns to host virtual episodes within which all agents participate and interact with their own biased representations. Since agents' biases vary, the collective policy developed while sequentially visiting the internal simulations complement one another's shortcomings. In our experiment, the collective policies consistently achieve significantly higher returns than the best individually trained policies.
- Code: http://s000.tinyupload.com/?file_id=54935721167326296555
- Keywords: collective policy, biased representation, model-based RL, simulation, imagination, virtual environment
- Original Pdf: pdf