Keywords: reinforcement learning, unsupervised environment design, open-endedness, benchmark, generalist agents.
TL;DR: BuilderBench is benchmark for research towards generalist agents that learn to solve diverse and complex tasks via interaction with a fast and open-ended environment
Abstract: Today’s AI models learn primarily through mimicry and sharpening, so it is
not surprising that they struggle to solve problems beyond the limits set by
existing data. To solve novel problems, agents should acquire skills for exploring
and learning through experience. Finding a scalable learning mechanism for
developing agents that learn through interaction remains a major open problem.
In this work, we introduce BuilderBench, a benchmark to accelerate research into
agent pre-training that centers open-ended exploration. BuilderBench requires
agents to learn how to build any structure using blocks. BuilderBench is equipped
with (1) a hardware accelerated simulator of a robotic agent interacting with
various physical blocks, and (2) a task-suite with over 50 diverse target structures
that are carefully curated to test an understanding of physics, mathematics, and
long-horizon planning. During training, agents have to explore and learn general
principles about the environment without any external supervision. During eval-
uation, agents have to build the handmade and unseen target structures from the
task suite. Solving these tasks requires a sort of embodied reasoning, that is not
reflected in words, but rather in actions, experimenting with different strategies
and piecing them together. Our experiments show that many of these tasks
challenge the current iteration of algorithms. Hence, we also provide a “training
wheels” protocol, in which agents are trained and evaluated to build a single
target structure from the task suite. Finally, we provide clean implementations of
seven different algorithms as a reference point for researchers.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 19521
Loading