Abstract: In reinforcement learning, reward-driven feature learning directly from high-dimensional
images faces two challenges: sample-efficiency for solving control tasks and generalization
to unseen observations. In prior works, these issues have been addressed through learning
representation from pixel inputs. However, their representation faced the limitations of being
vulnerable to the high diversity inherent in environments or not taking the characteristics for
solving control tasks. To attenuate these phenomena, we propose the novel contrastive
representation method, Action-Driven Auxiliary Task (ADAT), which forces a representation
to concentrate on essential features for deciding actions and ignore control-irrelevant
details. In the augmented state-action dictionary of ADAT, the agent learns representation
to maximize agreement between observations sharing the same actions. The proposed
method significantly outperforms model-free and model-based algorithms in the Atari and
OpenAI ProcGen, widely used benchmarks for sample-efficiency and generalization.
Loading