Empowerment, Free Energy Principle and Maximum Occupancy Principle Compared

Published: 27 Oct 2023, Last Modified: 20 Nov 2023InfoCog@NeurIPS2023 PosterEveryoneRevisionsBibTeX
Keywords: empowerment, free energy, maximum occupancy, Bellman equation, reward-free
TL;DR: This paper compares several reward-free approaches in RL for the first time, providing new tools and insights
Abstract: While the objective of reward maximization in reinforcement learning has lead to impressive achievements in several games and artificial environments, animals seem to be driven by intrinsic signals that are not purely extrinsic, such as curiosity. Several reward-free approaches have emerged in the fields of cognitive neuroscience and artificial intelligence that primarily make use of signals different from extrinsic rewards to guide exploration and ultimately drive behavior, but a comparison between these approaches is lacking. Here we focus on two popular reward-free approaches, known as empowerment (MPOW) and free energy principle (FEP), and a recently developed one, called maximum occupancy principle (MOP), and compare them in sequential problems and fully-observable environments. We find that MPOW shows a preference for unstable fixed points of the dynamical system that defines the agent and environment. FEP is shown to be equivalent to reward maximization in certain cases. None of these two principles of behavior seem to consistently generate variable behavior: behavior collapses within a small repertoire of possible action-state trajectories or fixed points. Collapse to an optimal deterministic policy can be proved in specific, recent implementations of FEP, with the only exception of policy degeneracy due to ties. In contrast, MOP consistently generates variable action-state trajectories. In two simple environments, a balancing cartpole and a grid world, we find that both MPOW and FEP agents stick to a relatively small set of states and actions, while MOP agents generate short of exploratory and dancing-like motions.
Submission Number: 11
Loading