<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <meta name="description"
          content="PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators.">
    <meta name="keywords" content="PoliFormer, Embodied Navigation, On-Policy RL, Transformer Policy">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>mbodied Navigation, On-Policy RL, Transformer Policy</title>

    <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
          rel="stylesheet">

    <link rel="stylesheet" href="./static/css/bulma.min.css">
    <link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
    <link rel="stylesheet" href="./static/css/bulma-slider.min.css">
    <link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
    <link rel="stylesheet"
          href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
    <link rel="stylesheet" href="./static/css/index.css">
    <link rel="icon" href="./static/images/favicon.svg">

    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
    <script defer src="./static/js/fontawesome.all.min.js"></script>
    <script src="./static/js/bulma-carousel.min.js"></script>
    <script src="./static/js/bulma-slider.min.js"></script>
    <script src="./static/js/index.js"></script>

</head>
<body>

<nav class="navbar" role="navigation" aria-label="main navigation">
    <div class="navbar-brand">
        <a role="button" class="navbar-burger" aria-label="menu" aria-expanded="false">
            <span aria-hidden="true"></span>
            <span aria-hidden="true"></span>
            <span aria-hidden="true"></span>
        </a>
    </div>
</nav>


<section class="hero">
    <div class="hero-body">
        <div class="container is-max-desktop">
            <div class="columns is-centered">
                <div class="column has-text-centered">
                    <h1 class="title is-1 publication-title">Supplementary website for<br/><i>"PoliFormer: Scaling
                        On-Policy RL with Transformers Results in Masterful Navigators"</i></h1>
                    <div class="is-size-5 publication-authors">
            <span class="author-block">
              Anonymous CoRL authors
            </span>
                    </div>

                </div>

            </div>
        </div>
    </div>
</section>

<section class="hero teaser">
    <div class="container is-max-desktop">
        <div class="hero-body">
            <img src="./static/images/poliformer-header.jpg"
                 class="interpolation-image"
                 alt="header-image."/>
            <h2 class="subtitle has-text-centered">
                <span class="dpoliformer">PoliFormer</span> is a transformer-based policy trained using RL at scale in
                simulation which achieves masterful navigation abilities in the real world.
            </h2>
        </div>
        <div class="content has-text-justified">
            <h2 class="title is-3 has-text-centered">Supplementary website contents</h2>
            <p>
                This supplementary website contains a collection of qualitative examples of our <span class="dpoliformer">PoliFormer</span> model in the
                real-world and simulation.
            </p>
            <ul>
                <li><a style="font-size: 1.5rem" href='#Real-world examples'>Real-world examples</a>
                    <ul>
                        <li><a href='#find_apple_locobot'>Find an apple (LoCoBot)</a></li>
                        <li><a href='#find_humans_book_stretch'>Find a book with title "Humans" (Stretch RE-1)</a></li>
                        <li><a href='#find_kitchen_stretch'>Find the kitchen (Stretch RE-1)</a></li>
                        <li><a href='#find_multi_stretch'>Find a sofa, book, toilet, and houseplant (Stretch RE-1)</a></li>
                        <li><a href='#follow_toy_truck_stretch'>Follow the toy truck (Stretch RE-1)</a></li>
                        <li><a href='#follow_person_stretch'>Follow the person (Stretch RE-1)</a></li>
                    </ul>
                </li>
                <li><a style="font-size: 1.5rem" href='#Simulation examples'>Simulation examples</a>
                    <ul>
                        <li><a href='#sim-2986_6_search_for_a_mug'>Backtracking in CHORES</a></li>
                        <li><a href='#sim-ArchitecTHOR-Test-00__proc5__global149__Laptop'>Finding a Laptop in ArchitecTHOR</a></li>
                        <li><a href='#sim-51__proc16__global519__Television'>Finding a Television in ProcTHOR</a></li>
                        <li><a href='#sim-FloorPlan326__proc21__global366__GarbageCan'>Finding a Garbage Can in iTHOR</a></li>
                    </ul>
                </li>
            </ul>
            <br/>
        </div>
    </div>
</section>

<section class="section hero is-light">
    <div class="container is-max-desktop">
        <div class="columns is-centered">
            <div class="column is-full-width">
                <h2 class="title is-3" id="Real-world examples">Real-world examples</h2>
                <div class="content has-text-justified">
                    <p>
                        Here we present a number of real-world examples filmed in a robot testing lab. All results are collected using our <span class="dpoliformer">PoliFormer</span> agent that was trained, in simulation, with ground-truth detections; in these real-world examples, detections are generated using Detic, an open-vocabulary object detector. The agent's RGB navigation inputs are shown, as well as a 3rd person perspective for some examples. All videos are sped up by up to 20x for ease of viewing.
                    </p>

                    <img src="./static/images/floorplan.png" class="interpolation-image" alt="floorplan"/>
                    <h5 class="subtitle has-text-centered">
                        Floorplan of the real-world environment used for these qualitative examples.
                    </h5>
                </div>

                <h3 class="title is-4" id="find_apple_locobot" >Find an apple (LoCoBot)</h3>
                <div class="content has-text-justified">
                    <p>
                        <span class="dpoliformer">PoliFormer</span> finds an apple after navigating down a long hallway with many obstacles, including a chair that moves during the trajectory.
                    </p>
                </div>
                <div class="content has-text-centered">
                    <video
                           controls
                           muted
                           preload
                           playsinline
                           width="100%">
                        <source src="static/videos/real-world/find_apple_locobot_480p.mp4" type="video/mp4">
                    </video>
                </div>

                <h3 class="title is-4"  id="find_humans_book_stretch" >Find a book with title "Humans" (Stretch RE-1)</h3>
                <div class="content has-text-justified">
                    <p>
                        <span class="dpoliformer">PoliFormer</span> ignores the book it begins the episode looking at
                        and searches multiple rooms until it finds the book with title "Humans". Please see the main
                        paper for a close up of the book in question, space constraints on the supplementary materials
                        prevent us from uploading a high resolution video of this trajectory.
                    </p>
                </div>
                <div class="content has-text-centered">
                    <video
                           controls
                           muted
                           preload
                           playsinline
                           width="100%">
                        <source src="static/videos/real-world/find_humans_book_stretch_480p.mp4" type="video/mp4">
                    </video>
                </div>


                <h3 class="title is-4" id="find_kitchen_stretch" >Find the kitchen (Stretch RE-1)</h3>
                <div class="content has-text-justified">
                    <p>
                        Starting from a bedroom, <span class="dpoliformer">PoliFormer</span> explores, correctly
                        avoids entering a bathroom, and finally finds the kitchen.
                    </p>
                </div>
                <div class="content has-text-centered">
                    <video
                           controls
                           muted
                           preload
                           playsinline
                           width="100%">
                        <source src="static/videos/real-world/find_kitchen_480p.mp4" type="video/mp4">
                    </video>
                </div>


                <h3 class="title is-4" id="find_multi_stretch" >Find a sofa, book, toilet, and houseplant (Stretch RE-1)</h3>
                <div class="content has-text-justified">
                    <p>
                       <span class="dpoliformer">PoliFormer</span> is able to find multiple objects in a single episode.
                        Here it initially finds a sofa and book, then a houseplant, and finally a toilet.
                    </p>
                </div>
                <div class="content has-text-centered">
                    <video
                           controls
                           muted
                           preload
                           playsinline
                           width="100%">
                        <source src="static/videos/real-world/find_multi_stretch_480p.mp4" type="video/mp4">
                    </video>
                </div>

                <h3 class="title is-4" id="follow_toy_truck_stretch" >Follow the toy truck (Stretch RE-1)</h3>
                <div class="content has-text-justified">
                    <p>
                        <span class="dpoliformer">PoliFormer</span> follows a toy truck as it moves through multiple
                        rooms in an indoor environment. Note that
                        <span class="dpoliformer">PoliFormer</span> is not trained in dynamic environments but is nevertheless able to navigate while its target moves..
                    </p>
                </div>
                <div class="content has-text-centered">
                    <video
                           controls
                           muted
                           preload
                           playsinline
                           width="100%">
                        <source src="static/videos/real-world/follow_toy_truck_480p.mp4" type="video/mp4">
                    </video>
                </div>

                <h3 class="title is-4" id="follow_person_stretch" >Follow the person (Stretch RE-1)</h3>
                <div class="content has-text-justified">
                    <p>
                        Similar to the above example, <span class="dpoliformer">PoliFormer</span> follows a person as they
                        move down a hallway and into a kitchen.
                    </p>
                </div>
                <div class="content has-text-centered">
                    <video
                           controls
                           muted
                           preload
                           playsinline
                           width="100%">
                        <source src="static/videos/real-world/follow_person_stretch_480p.mp4" type="video/mp4">
                    </video>
                </div>

            </div>
        </div>
    </div>
</section>


<section class="section hero ">
    <div class="container is-max-desktop">
        <div class="columns is-centered">
            <div class="column is-full-width">
                <h2 class="title is-3" id="Simulation examples">Simulation examples</h2>

                <p>
                    Here we show multiple examples of <span class="dpoliformer">PoliFormer</span>'s behavior in simulation. In addition to
                    the agent's RGB camera input, we also display the
                    probabilities the agent assigns to each of its available actions. For the Stretch agent
                    we show two RGB images side-by-side, the first (left) is the agent's RGB camera input, and the
                    second (right) corresponds to a "manipulation" camera that is positioned 90 degrees clockwise
                    from the agent's front-facing camera. The manipulation camera is purely for visualization,
                    <strong>our agent only sees the left image during training and inference</strong>.
                </p>

                <h3 class="title is-4" style="padding-top: 1em" id="sim-2986_6_search_for_a_mug" >Backtracking in CHORES</h3>
                <div class="content has-text-justified">
                    <p>
                        <span class="dpoliformer">PoliFormer</span> (Stretch RE-1 embodiment) explores multiple rooms, backtracks and
                        finally finds the requested mug.
                    </p>
                </div>
                <div class="content has-text-centered">
                    <video
                           controls
                           muted
                           preload
                           playsinline
                           width="100%">
                        <source src="./static/videos/sim/chores/2986_6_search_for_a_mug.mp4" type="video/mp4">
                    </video>
                </div>

                <h3 class="title is-4"
                    id="sim-ArchitecTHOR-Test-00__proc5__global149__Laptop"
                    style="padding-top: 1em" >Finding a Laptop in ArchitecTHOR</h3>
                <div class="content has-text-justified">
                    <p>
                        The <span class="dpoliformer">PoliFormer</span> agent (LoCoBot embodiment) ignores the bathroom in its search for a laptop
                        in the bedroom. Top-down view is for visualization purposes only.
                    </p>
                </div>
                <div class="content has-text-centered"
                     style="display: flex; align-items: center; justify-content: center;">
                    <video
                           controls
                           muted
                           preload
                           playsinline
                           width="50%"> <!-- Adjust the width as needed -->
                        <source src="./static/videos/sim/architecthor/ArchitecTHOR-Test-00__proc5__global149__Laptop.mp4" type="video/mp4">
                    </video>
                    <img src="./static/videos/sim/architecthor/ArchitecTHOR-Test-00__proc5__global149__Laptop.png" alt="ArchitecTHOR Image" width="25%"> <!-- Adjust the width as needed -->
                </div>


                <h3 class="title is-4"
                    id="sim-51__proc16__global519__Television"
                    style="padding-top: 1em" >Finding a Television in ProcTHOR</h3>
                <div class="content has-text-justified">
                    <p>
                        The <span class="dpoliformer">PoliFormer</span> agent (LoCoBot embodiment) searches through every room in a house before
                        finally finding the television mounted on a wall. Top-down view is for visualization purposes only.
                    </p>
                </div>
                <div class="content has-text-centered" style="display: flex; align-items: center; justify-content: center;">
                    <video
                           controls
                           muted
                           preload
                           playsinline
                           width="50%"> <!-- Adjust the width as needed -->
                        <source src="./static/videos/sim/procthor/51__proc16__global519__Television.mp4" type="video/mp4">
                    </video>
                    <img src="./static/videos/sim/procthor/51__proc16__global519__Television.png" alt="ArchitecTHOR Image" width="25%"> <!-- Adjust the width as needed -->
                </div>

                <h3 class="title is-4"
                    id="sim-FloorPlan326__proc21__global366__GarbageCan"
                    style="padding-top: 1em" >Finding a Garbage Can in iTHOR</h3>
                <div class="content has-text-justified">
                    <p>
                        The <span class="dpoliformer">PoliFormer</span> agent (LoCoBot embodiment) first performs a 360 degree spint to scan the environment,
                        it then looks behind a bed, backtracks, and finally finds the garbage can next to a desk. Top-down view is for visualization purposes only.
                    </p>
                </div>
                <div class="content has-text-centered" style="display: flex; align-items: center; justify-content: center;">
                    <video
                           controls
                           muted
                           preload
                           playsinline
                           width="50%">
                        <source src="./static/videos/sim/ithor/FloorPlan326__proc21__global366__GarbageCan.mp4" type="video/mp4">
                    </video>
                    <img src="./static/videos/sim/ithor/FloorPlan326__proc21__global366__GarbageCan.png" alt="ArchitecTHOR Image" width="25%"> <!-- Adjust the width as needed -->
                </div>

            </div>
        </div>
    </div>
</section>

<footer class="footer" style="background-color: #f5f5f5">
    <div class="container">
        <div class="columns is-centered">
            <div class="column is-8">
                <div class="content">
                    <p>
                        This website based on the <a href="https://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA
                        4.0</a> licensed
                        <a rel="template" href="https://github.com/nerfies/nerfies.github.io">Nerfies website</a>.
                    </p>
                </div>
            </div>
        </div>
    </div>
</footer>

</body>
</html>
