Abstract: Applications of reinforcement learning to continuous control tasks often rely on a steady, informative reward signal. In videogames, however, tasks may be far easier to specify through a binary reward that indicates success or failure. In the absence of a steady, guiding reward, the agent may struggle to explore efficiently, especially if effective exploration requires strong coordination between actions. In this paper, we show empirically that this issue may be mitigated by exploring over an abstract action set, using hierarchically composed parameterized skills. We experiment in two tasks with sparse rewards in a continuous control environment based on the arcade game Asteroids. Compared to a flat learner that explores symmetrically over low-level actions, our agent explores a greater variety of useful actions, and its long-term performance on both tasks is superior.
0 Replies
Loading