Abstract: Hierarchical reinforcement learning captures sub-task information to learn modular policies that can be quickly adapted to new tasks. While hierarchies can be learned jointly with policies, this requires a lot of interaction. Traditional approaches require less data, but typically require sub-task labels to build a task hierarchy. We propose a semi-supervised constrained clustering approach to alleviate the labeling and interaction requirements. Our approach combines limited supervision with an arbitrary set of weak constraints, obtained purely from observations, that is jointly optimized to produce a clustering of the states into sub-tasks. We demonstrate improvement in two visual reinforcement learning tasks.