On the impact of lowering temperature on learning with behavior clones

Published: 01 Jun 2024, Last Modified: 17 Jun 2024CoCoMARL 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, offline learning, behavior clone, model-based reinforcement learning
Abstract: In an environment requiring cooperation with unknown external agents, the agent will need to adapt and adjust their policies according to the external agent's behavior. We cannot simply adopt a self-optimal policy and assume the other agent to be similarly optimal. Even in cases where the external agent is highly adaptive, i.e., a human, it could still result in sub-optimal performance. Limited access to the external agent further compounds this challenge, rendering direct training implausible. To address this, a behavior clone (proxy) can be created from observations and the agent is subsequently trained offline with the behavior clone as partner. However, the accuracy of the behavior clone is often not guaranteed, constrained by limitations such as the amount of observations or the clone's inherent capacity. This inaccuracy of the behavior clone could lead to a decline in performance or even outright training failure. This paper will first demonstrate that learning from clones could result in a drop in the agent's performance. Following, it will show that lowering the temperature of the clone's behavior during training mitigates this drop in the agent's performance. These findings offer insights that could potentially contribute to improving learning from behavior clones.
Submission Number: 14
Loading