Sample-Efficient Goal-Conditioned Reinforcement Learning via Predictive Information Bottleneck for Goal Representation Learning
Abstract: We propose Predictive Information bottleneck for Goal representation learning (PI-Goal), a self-supervised method for sample-efficient goal-conditioned reinforcement learning (RL). Goal-conditioned RL learns to reach commanded goals with reward signals. A goal could be given in a noisy or abstract form, and thus jeopardizes sample efficiency. Previous methods usually assume that the agent can map a state to an achievable goal. In this work, we consider a setting in which the goal space is unknown to the agent and the agent cannot recognize a goal in a specific state (referred to as a goal state) until the goal is commanded. Our PI-Goal learns a goal representation which contains only the predictive information of a goal state, i.e., the mutual information between a current state and a future state, and guarantees the optimality of the learned policy. Experimental results show that PI-Goal consistently outperforms the baseline methods in tasks with unknown goal spaces, e.g., object manipulation, object search, and embodied question answering.
Loading