Abstract: Reinforcement learning (RL) typically assumes repetitive resets to provide an agent
with diverse and unbiased experiences. These resets require significant human intervention
and result in poor training efficiency in real-world settings. Autonomous
RL (ARL) addresses this challenge by jointly training forward and reset policies.
While recent ARL algorithms have shown promise in reducing human intervention,
they assume narrow support over the distributions of initial or goal states and rely
on task-specific knowledge to identify irreversible states. In this paper, we propose
a robust and scalable ARL algorithm, called RSA, that enables an agent to handle
diverse initial and goal states and to avoid irreversible states without task-specific
knowledge. RSA generates a curriculum by identifying informative states based
on the learning progress of an agent. We hypothesize that informative states are
neither overly difficult nor trivially easy for the agent being trained. To detect
and avoid irreversible states without task-specific knowledge, RSA encodes the
behaviors exhibited in those states rather than the states themselves. Experimental
results demonstrate that RSA outperforms existing ARL algorithms with fewer
manual resets in both reversible and irreversible environments.
Loading