Supplementary website for
"PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators"

Anonymous CoRL authors
header-image.

PoliFormer is a transformer-based policy trained using RL at scale in simulation which achieves masterful navigation abilities in the real world.

Real-world examples

Here we present a number of real-world examples filmed in a robot testing lab. All results are collected using our PoliFormer agent that was trained, in simulation, with ground-truth detections; in these real-world examples, detections are generated using Detic, an open-vocabulary object detector. The agent's RGB navigation inputs are shown, as well as a 3rd person perspective for some examples. All videos are sped up by up to 20x for ease of viewing.

floorplan
Floorplan of the real-world environment used for these qualitative examples.

Find an apple (LoCoBot)

PoliFormer finds an apple after navigating down a long hallway with many obstacles, including a chair that moves during the trajectory.

Find a book with title "Humans" (Stretch RE-1)

PoliFormer ignores the book it begins the episode looking at and searches multiple rooms until it finds the book with title "Humans". Please see the main paper for a close up of the book in question, space constraints on the supplementary materials prevent us from uploading a high resolution video of this trajectory.

Find the kitchen (Stretch RE-1)

Starting from a bedroom, PoliFormer explores, correctly avoids entering a bathroom, and finally finds the kitchen.

Find a sofa, book, toilet, and houseplant (Stretch RE-1)

PoliFormer is able to find multiple objects in a single episode. Here it initially finds a sofa and book, then a houseplant, and finally a toilet.

Follow the toy truck (Stretch RE-1)

PoliFormer follows a toy truck as it moves through multiple rooms in an indoor environment. Note that PoliFormer is not trained in dynamic environments but is nevertheless able to navigate while its target moves..

Follow the person (Stretch RE-1)

Similar to the above example, PoliFormer follows a person as they move down a hallway and into a kitchen.

Simulation examples

Here we show multiple examples of PoliFormer's behavior in simulation. In addition to the agent's RGB camera input, we also display the probabilities the agent assigns to each of its available actions. For the Stretch agent we show two RGB images side-by-side, the first (left) is the agent's RGB camera input, and the second (right) corresponds to a "manipulation" camera that is positioned 90 degrees clockwise from the agent's front-facing camera. The manipulation camera is purely for visualization, our agent only sees the left image during training and inference.

Backtracking in CHORES

PoliFormer (Stretch RE-1 embodiment) explores multiple rooms, backtracks and finally finds the requested mug.

Finding a Laptop in ArchitecTHOR

The PoliFormer agent (LoCoBot embodiment) ignores the bathroom in its search for a laptop in the bedroom. Top-down view is for visualization purposes only.

ArchitecTHOR Image

Finding a Television in ProcTHOR

The PoliFormer agent (LoCoBot embodiment) searches through every room in a house before finally finding the television mounted on a wall. Top-down view is for visualization purposes only.

ArchitecTHOR Image

Finding a Garbage Can in iTHOR

The PoliFormer agent (LoCoBot embodiment) first performs a 360 degree spint to scan the environment, it then looks behind a bed, backtracks, and finally finds the garbage can next to a desk. Top-down view is for visualization purposes only.

ArchitecTHOR Image