Abstract: To achieve the ambitious goals of artificial intelligence, reinforcement learning must include
planning with a model of the world that is abstract in state and time. Deep learning has made
progress in state abstraction, but, although the
theory of time abstraction has been extensively
developed based on the options framework, in
practice options have rarely been used in planning. One reason for this is that the space of
possible options is immense and the methods previously proposed for option discovery do not take
into account how the option models will be used
in planning. Options are typically discovered by
posing subsidiary tasks such as reaching a bottleneck state, or maximizing a sensory signal other
than the reward. Each subtask is solved to produce an option, and then a model of the option is
learned and made available to the planning process. The subtasks proposed in most previous
work ignore the reward on the original problem,
whereas we propose subtasks that use the original reward plus a bonus based on a feature of the
state at the time the option stops. We show that
options and option models obtained from such
reward-respecting subtasks are much more likely
to be useful in planning and can be learned online
and off-policy using existing learning algorithms.
Reward respecting subtasks strongly constrain the
space of options and thereby also provide a partial solution to the problem of option discovery.
Finally, we show how the algorithms for learning values, policies, options, and models can be
unified using general value functions.
0 Replies
Loading