Discovery in Reinforcement Learning

Vivek Veeriah

Published: 01 Jan 2022, Last Modified: 15 May 2025undefined 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This thesis focuses on Reinforcement Learning (RL) which considers an agent that makes sequen- tial decisions in an environment to maximize its reward. Recent breakthroughs on domains such as Go, Atari games, and Robotics are a result of combining advances from Deep Learning with RL algorithms. This combination of RL with deep neural networks has moved the field from hand- designing feature representations to learning features directly from raw data (e.g., pixel images). Despite these advancements, RL practitioners still need to define what predictive knowledge an agent needs to learn, instead of the agent autonomously discovering such knowledge. Developing the ability to discover what predictive knowledge is useful to learn about could allow the RL agents to learn more efficiently. In this thesis, we identify some of the challenges towards designing agents with the aforementioned ability, and propose and evaluate methods to address them. In the first part of this thesis, we present an approach for discovering multiple subgoals to learn optimal policies from a single stream of experience. We show that this approach is a useful pre- training procedure and as a source of auxiliary task updates when there is a main task of interest. In the second part of this thesis, we present three approaches to discover useful knowledge of various forms to maximize the agent’s task performance. The first one introduces an architecture and an associated meta-gradient algorithm to discover predictive questions about the agent’s ex- perience to drive representation learning in RL agents. We show that it helps in faster learning when used as auxiliary tasks. The second work introduces another meta-gradient approach to dis- cover temporal-abstractions in the form of options that enables a hierarchical RL agent to learn faster on new, unseen tasks. The third one presents an algorithm to select a small number of affor- dances in the form of actions or options from a continuous space to improve the performance of a model-based planning agent on hierarchical tasks. The final part studies the problem of learning and maintaining accurate knowledge in the form of option-conditional predictions in a partially-observable environment. Specifically, we implement and empirically demonstrate the thought experiment by Ring [2021], which provided a detailed blueprint of how high-level, abstract knowledge could potentially be represented as layered pre- dictions of agent’s sensorimotor stream in navigation environments. In addition to this empirical demonstration, we introduce an approach to discover those option-conditional predictions which were previously hand-defined and demonstrate the feasibility of this approach.