Here we present a number of real-world examples filmed in a robot testing lab. All results are collected using our PoliFormer agent that was trained, in simulation, with ground-truth detections; in these real-world examples, detections are generated using Detic, an open-vocabulary object detector. The agent's RGB navigation inputs are shown, as well as a 3rd person perspective for some examples. All videos are sped up by up to 20x for ease of viewing.
PoliFormer finds an apple after navigating down a long hallway with many obstacles, including a chair that moves during the trajectory.
PoliFormer ignores the book it begins the episode looking at and searches multiple rooms until it finds the book with title "Humans". Please see the main paper for a close up of the book in question, space constraints on the supplementary materials prevent us from uploading a high resolution video of this trajectory.
Starting from a bedroom, PoliFormer explores, correctly avoids entering a bathroom, and finally finds the kitchen.
PoliFormer is able to find multiple objects in a single episode. Here it initially finds a sofa and book, then a houseplant, and finally a toilet.
PoliFormer follows a toy truck as it moves through multiple rooms in an indoor environment. Note that PoliFormer is not trained in dynamic environments but is nevertheless able to navigate while its target moves..
Similar to the above example, PoliFormer follows a person as they move down a hallway and into a kitchen.
Here we show multiple examples of PoliFormer's behavior in simulation. In addition to the agent's RGB camera input, we also display the probabilities the agent assigns to each of its available actions. For the Stretch agent we show two RGB images side-by-side, the first (left) is the agent's RGB camera input, and the second (right) corresponds to a "manipulation" camera that is positioned 90 degrees clockwise from the agent's front-facing camera. The manipulation camera is purely for visualization, our agent only sees the left image during training and inference.
PoliFormer (Stretch RE-1 embodiment) explores multiple rooms, backtracks and finally finds the requested mug.
The PoliFormer agent (LoCoBot embodiment) ignores the bathroom in its search for a laptop in the bedroom. Top-down view is for visualization purposes only.
The PoliFormer agent (LoCoBot embodiment) searches through every room in a house before finally finding the television mounted on a wall. Top-down view is for visualization purposes only.
The PoliFormer agent (LoCoBot embodiment) first performs a 360 degree spint to scan the environment, it then looks behind a bed, backtracks, and finally finds the garbage can next to a desk. Top-down view is for visualization purposes only.