TL;DR: Learning emergent behavior by minimizing Bayesian surprise with RL in natural environments with entropy.
Abstract: All living organisms struggle against the forces of nature to carve out niches where
they can maintain relative stasis. We propose that such a search for order amidst
chaos might offer a unifying principle for the emergence of useful behaviors in
artificial agents. We formalize this idea into an unsupervised reinforcement learning
method called surprise minimizing RL (SMiRL). SMiRL trains an agent with the
objective of maximizing the probability of observed states under a model trained on
all previously seen states. The resulting agents acquire several proactive behaviors
to seek and maintain stable states such as balancing and damage avoidance, that
are closely tied to the affordances of the environment and its prevailing sources
of entropy, such as winds, earthquakes, and other agents. We demonstrate that
our surprise minimizing agents can successfully play Tetris, Doom, and control
a humanoid to avoid falls, without any task-specific reward supervision. We
further show that SMiRL can be used as an unsupervised pre-training objective
that substantially accelerates subsequent reward-driven learning
Keywords: intrinsic motivation, reinforcement learning, unsurpervised RL
Original Pdf: pdf
9 Replies
Loading