
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>

<script src="bootstrap.js"></script>
<script type="text/javascript" charset="utf-8" src="https://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js"></script> 
<!---
<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
--->
<script src="load-mathjax.js" async></script>


<style type="text/css">
body {
    font-family: "Titillium Web", "HelveticaNeue-Light", "Helvetica Neue Light", "Helvetica Neue", Helvetica, Arial, "Lucida Grande", sans-serif;
    font-weight: 300;
    font-size: 17px;
    margin-left: auto;
    margin-right: auto;
}

@media screen and (min-width: 980px){
    body {
        width: 980px;
    }
}


h1 {
    font-weight:300;
    line-height: 1.15em;
}

h2 {
    font-size: 1.75em;
}
a:link,a:visited {
    color: #5364cc;
    text-decoration: none;
}
a:hover {
    color: #208799;
}
h1 {
    text-align: center;
}
h2,h3 {
    text-align: left;
}

h1 {
    font-size: 40px;
    font-weight: 500;
}
h2 {
    font-weight: 400;
    margin: 16px 0px 4px 0px;
}
h3 {
    font-weight: 600;
    margin: 16px 0px 4px 0px;
}

.paper-title {
    padding: 1px 0px 1px 0px;
}
section {
    margin: 32px 0px 32px 0px;
    text-align: justify;
    clear: both;
}
.col-5 {
     width: 20%;
     float: left;
}

.move-down {
    margin-top:1.2cm;
}

.col-4 {
     width: 25%;
     float: left;
}
.col-3 {
     width: 33%;
     float: left;
}
.col-2 {
     width: 50%;
     float: left;
}
.col-1 {
     width: 100%;
     float: left;
}

.author-row, .affil-row {
    font-size: 26px;
}

.author-row-new { 
    text-align: center; 
}

.author-row-new a {
    display: inline-block;
    font-size: 20px;
    padding: 4px;
}

.author-row-new sup {
    color: #313436;
    font-size: 12px;
}

.affiliations-new {
    font-size: 18px;
    text-align: center;
    width: 80%;
    margin: 0 auto;
    margin-bottom: 20px;
}

.row {
    margin: 16px 0px 16px 0px;
}
.authors {
    font-size: 26px;
}
.affiliatons {
    font-size: 18px;
}
.affil-row {
    margin-top: 18px;
}
.teaser {
    max-width: 100%;
}
.text-center {
    text-align: center;  
}
.screenshot {
    width: 256px;
    border: 1px solid #ddd;
}
.screenshot-el {
    margin-bottom: 16px;
}
hr {
    height: 1px;
    border: 0; 
    border-top: 1px solid #ddd;
    margin: 0;
}
.material-icons {
    vertical-align: -6px;
}
p {
    line-height: 1.25em;
}
.caption {
    font-size: 16px;
    color: #666;
    margin-top: 4px;
    margin-bottom: 10px;
	text-align: left;
}


video {
    display: block;
    margin: auto;
}


figure {
    display: block;
    margin: auto;
    margin-top: 10px;
    margin-bottom: 10px;
}
#bibtex pre {
    font-size: 14px;
    background-color: #eee;
    padding: 16px;
}
.blue {
    color: #2c82c9;
    font-weight: bold;
}
.orange {
    color: #d35400;
    font-weight: bold;
}
.flex-row {
    display: flex;
    flex-flow: row wrap;
    padding: 0;
    margin: 0;
    list-style: none;
}

.paper-btn-coming-soon {
    position: relative; 
    top: 0;
    left: 0;
}

.coming-soon {
    position: absolute;
    top: -15px;
    right: -15px;
}

.center {
  margin-left: 10.0%;
  margin-right: 10.0%;
}

.paper-btn {
  position: relative;
  text-align: center;

  display: inline-block;
  margin: 8px;
  padding: 8px 8px;

  border-width: 0;
  outline: none;
  border-radius: 2px;
  
  background-color: #5364cc;
  color: white !important;
  font-size: 20px;
  width: 100px;
  font-weight: 600;
}
.paper-btn-parent {
    display: flex;
    justify-content: center;
    margin: 16px 0px;
}

.paper-btn:hover {
    opacity: 0.85;
}

.container {
    margin-left: auto;
    margin-right: auto;
    padding-left: 16px;
    padding-right: 16px;
}

.venue {
    font-size: 23px;
}

.topnav {
    background-color: #EEEEEE;
    overflow: hidden;
}

.topnav div {
    max-width: 1070px;
    margin: 0 auto;
}

.topnav a {
    display: inline-block;
    color: black;
    text-align: center;
    vertical-align: middle;
    padding: 16px 16px;
    text-decoration: none;
    font-size: 18px;
}

.topnav img {
    padding: 2px 0px;
    width: 100%;
    margin: 0.2em 0px 0.3em 0px;
    vertical-align: middle;
}

pre {
    font-size: 0.9em;
    padding-left: 7px;
    padding-right: 7px;
    padding-top: 3px;
    padding-bottom: 3px;
    border-radius: 3px;
    background-color: rgb(235, 235, 235);
    overflow-x: auto;
}

.download-thumb {
    display: flex;
}

@media only screen and (max-width: 620px) {
    .download-thumb {
        display: none;
    }
}

.paper-stuff {
    width: 50%;
    font-size: 20px;
}

@media only screen and (max-width: 620px) {
    .paper-stuff {
        width: 100%;
    }
}
* {
  box-sizing: border-box;
}

.column {
  text-align: center;
  float: left;
  width: 16.666%;
  padding: 5px;
}
.column3 {
  text-align: center;
  float: left;
  width: 33.333%;
  padding: 5px;
}
.column4 {
  text-align: center;
  float: left;
  width: 50%;
  padding: 5px;
}
.column5 {
  text-align: center;
  float: left;
  width: 20%;
  padding: 5px;
}
.column10 {
  text-align: center;
  float: left;
  width: 10%;
  padding: 5px;
}
.border-right {
    border-right: 1px solid black;
}
.border-bottom{
    border-bottom: 1px solid black;
}


.row-center {
    margin: 16px 0px 16px 0px;
    text-align: center;
}

/* Clearfix (clear floats) */
.row::after {
  content: "";
  clear: both;
  display: table;
}
.img-fluid {
  max-width: 100%;
  height: auto;
}
.figure-img {
  margin-bottom: 0.5rem;
  line-height: 1;
}








.rounded-circle {
  border-radius: 50% !important;
}






/* Responsive layout - makes the three columns stack on top of each other instead of next to each other */
@media screen and (max-width: 500px) {
  .column {
    width: 100%;
  }
}
@media screen and (max-width: 500px) {
  .column3 {
    width: 100%;
  }
}

</style>
<link rel="stylesheet" href="bootstrap-grid.css">

<script type="text/javascript" src="../js/hidebib.js"></script>
    <link href='https://fonts.googleapis.com/css?family=Titillium+Web:400,600,400italic,600italic,300,300italic' rel='stylesheet' type='text/css'>
    <head>
        <title> https://fonts.googleapis.com/css2?family=Material+Icons</title>
        <meta name="viewport" content="width=device-width, initial-scale=1">
        <meta property="og:description" content="Learning Universal Policies via Text-Guided Video Generation"/>
        <link href="https://fonts.googleapis.com/css2?family=Material+Icons" rel="stylesheet">
    </head>

 <body>


<div class="container">
    <div class="paper-title">
    <h1> 
        Learning Universal Policies via Text-Guided Video Generation
    </div>

    </div>

    <section id="abstract"/>
        <hr>
        <h2>Abstract</h2>
        <div class="flex-row">
            <p>
            A goal of artificial intelligence is to construct a  agent that can solve a wide variety of tasks. Recent progress in text-guided image synthesis has yielded models with impressive abilities and rich combinatorial generalization of complex novel images across different domains.  Motivated by this success, we investigate whether we may use such tools to construct more general-purpose agents. Specifically, we cast the sequential decision making problem as a text-conditioned video generation problem, where, given a text-encoded specification of a desired goal, a planner synthesizes a set of future frames depicting its planned actions in the future, and the actions will be extracted from the generated video.  By leveraging text as our underlying goal specification, we are able to naturally combinatorially generalize to unseen goals. Our policy-as-video formulation can further represent environments with different state and action spaces in a unified space of images, enabling learning and generalization across a wide range of robotic manipulation tasks. Finally, by leveraging pretrained language embeddings and widely available videos on the internet, our formulation enables knowledge transfer through predicting highly realistic video plans of real robots.
            </p>
        </div>
    </section>
        

    <section id="results">
        <hr>
        <h2>Combinatorial Generalization</h2>  
            <div class="flex-row">
                <p>	
                Below, we illustrate generated videos on unseen combinatorial combinations of goals. Our approach is able to
                synthesize a diverse set of different behaviors which satisfy unseen language subgoals.
                </p>
            </div> 

            <center>
            <figure>
                <video width="1000" loop autoplay muted controls>
                    <source src="materials/website_figure_1.mp4" type="video/mp4">
                </video>
            </figure>
            </center>

        <hr>


        <h2>Multitask Learning</h2>  
            <div class="flex-row">
                <p>
                Below, we illustrate generated videos on unseen tasks. Our approach is further able to synthesize a diverse set
                of different behaviors which satisfy unseen language tasks.
                </p>
            </div> 

            <center>
            <figure>
                <video width="1000" loop autoplay muted controls>
                    <source src="materials/website_figure_2.mp4" type="video/mp4">
                </video>
            </figure>
            </center>

        <hr>

        <h2>Real Robot Videos</h2>  
            <div class="flex-row">
                <p>Below, we illustrate generated videos given language instructions on unseen images. Our approach is
                able to synthesize a diverse set of different behaviors which satisfy language instructions.</p>
            </div> 

            <center>
            <figure>
                <video width="1000" loop autoplay muted controls>
                    <source src="materials/website_figure_3.mp4" type="video/mp4">
                </video>
            </figure>
            </center>

            <div class="flex-row">
                <p>Our approach is further able to generate videos of robot behaviors given unseen natural language
                instructions. Note that pretraining on a large online dataset of text/videos significantly helps 
                generalization to unseen natural language queries.
                </p>
            </div> 

            <center>
            <figure>
                <video width="700" loop autoplay muted controls>
                    <source src="materials/website_figure_4.mp4" type="video/mp4">
                </video>
            </figure>
            </center>

        <hr>

    </section> 

</div>
</body>
</html>
