Keywords: continual learning, regret minimization, reinforcement learning
TL;DR: We extend regret minimization by conditioning on agent's previous actions, which allows it to model impact of its own actions.
Abstract: Continual learning is a task in which a learning algorithm needs to constantly adapt.
Modern reinforcement learning algorithms demonstrated strong performance across a large selection of problems.
However, certain assumptions they make about the environment are violated in the continual learning setting.
We thus turn to regret minimization algorithms, which have strong hindsight performance guarantees while making minimal assumptions about the environment.
We present a novel framework which extends the guarantees of the regret minimizer to recent history.
In particular, this allows it to model the impact of its own actions on the environment, and adapt accordingly.
We combine our framework with regret minimizers which are able to work with continuous observations and maximize the expected reward.
We can thus get the best of both worlds---an algorithm with strong hindsight guarantees which simultaneously maximizes expected reward akin to reinforcement learning.
We study the advantages of our algorithm in small, illustrative environments.
Submission Number: 14
Loading