- Keywords: planning, model learning, distance learning, reinforcement learning, robotics
- Abstract: A generalist robot must be able to complete a variety of tasks in its environment. One appealing way to specify each task is in terms of a goal observation. However, learning goal-reaching policies with reinforcement learning remains a challenging problem, particularly when rewards are not provided and $\ell_2$ distances in pixel space are not meaningful. Learned dynamics models are a promising approach for learning about the environment without rewards or task-directed data, but planning to reach goals with such a model requires a notion of functional similarity between observations and goal states. We present a self-supervised method for model-based visual goal reaching, which uses both a visual dynamics model as well as a dynamical distance function learned using model-free reinforcement learning. This approach trains entirely using offline, unlabeled data, making it practical to scale to large and diverse datasets. On several challenging robotic manipulation tasks with only offline, unlabeled data, we find that our algorithm compares favorably to prior model-based and model-free reinforcement learning methods. In ablation experiments, we additionally identify important factors for learning effective distances.
- One-sentence Summary: We combine model-based planning with dynamical distance learning to solve visual goal-reaching tasks, using random, unlabeled, experience.
- Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics