
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>

<!-- ======================================================================= -->
<script src="http://www.google.com/jsapi" type="text/javascript"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.4/latest.js?config=AM_CHTML"></script>
<script type="text/javascript">google.load("jquery", "1.3.2");</script>
<style type="text/css">
  body {
    font-family: "Titillium Web","HelveticaNeue-Light", "Helvetica Neue Light", "Helvetica Neue", Helvetica, Arial, "Lucida Grande", sans-serif;
    font-weight:300;
    font-size:18px;
    margin-left: auto;
    margin-right: auto;
    width: 100%;
  }

  h1 {
    font-weight:300;
  }

  div {
    max-width: 95%;
    margin:auto;
    padding: 10px;
  }

  .paddingBetweenCols td {
  padding: 5 15px;
}

  .table-like {
    display: flex;
    flex-wrap: wrap;
    flex-flow: row wrap;
    justify-content: center;
  }

  .disclaimerbox {
    background-color: #eee;
    border: 1px solid #eeeeee;
    border-radius: 10px ;
    -moz-border-radius: 10px ;
    -webkit-border-radius: 10px ;
    padding: 20px;
  }

  video.header-vid {
    height: 140px;
    border: 1px solid black;
    border-radius: 10px ;
    -moz-border-radius: 10px ;
    -webkit-border-radius: 10px ;
  }

  img {
    padding: 0;
    display: block;
    margin: 0 auto;
    max-height: 100%;
    max-width: 100%;
  }

  iframe {
    max-width: 100%;
  }

  img.header-img {
    height: 140px;
    border: 1px solid black;
    border-radius: 10px ;
    -moz-border-radius: 10px ;
    -webkit-border-radius: 10px ;
  }

  img.rounded {
    border: 1px solid #eeeeee;
    border-radius: 10px ;
    -moz-border-radius: 10px ;
    -webkit-border-radius: 10px ;
  }

  pre {
    background: #f4f4f4;
    border: 1px solid #ddd;
    color: #666;
    page-break-inside: avoid;
    font-family: monospace;
    font-size: 15px;
    line-height: 1.6;
    margin-bottom: 1.6em;
    max-width: 100%;
    overflow: auto;
    padding: 10px;
    display: block;
    word-wrap: break-word;
}

  a:link,a:visited
  {
    color: #1367a7;
    text-decoration: none;
  }
  a:hover {
    color: #208799;
  }

  td.dl-link {
    height: 160px;
    text-align: center;
    font-size: 22px;
  }

  .rotate {
  text-align: center;
  white-space: nowrap;
  vertical-align: middle;
  width: 1.5em;
}
.rotate div {
     -moz-transform: rotate(-90.0deg);  /* FF3.5+ */
       -o-transform: rotate(-90.0deg);  /* Opera 10.5 */
  -webkit-transform: rotate(-90.0deg);  /* Saf3.1+, Chrome */
             filter:  progid:DXImageTransform.Microsoft.BasicImage(rotation=0.083);  /* IE6,IE7 */
         -ms-filter: "progid:DXImageTransform.Microsoft.BasicImage(rotation=0.083)"; /* IE8 */
         margin-left: -10em;
         margin-right: -10em;
}

  .layered-paper-big { /* modified from: http://css-tricks.com/snippets/css/layered-paper/ */
    box-shadow:
            0px 0px 1px 1px rgba(0,0,0,0.35), /* The top layer shadow */
            5px 5px 0 0px #fff, /* The second layer */
            5px 5px 1px 1px rgba(0,0,0,0.35), /* The second layer shadow */
            10px 10px 0 0px #fff, /* The third layer */
            10px 10px 1px 1px rgba(0,0,0,0.35), /* The third layer shadow */
            15px 15px 0 0px #fff, /* The fourth layer */
            15px 15px 1px 1px rgba(0,0,0,0.35), /* The fourth layer shadow */
            20px 20px 0 0px #fff, /* The fifth layer */
            20px 20px 1px 1px rgba(0,0,0,0.35), /* The fifth layer shadow */
            25px 25px 0 0px #fff, /* The fifth layer */
            25px 25px 1px 1px rgba(0,0,0,0.35); /* The fifth layer shadow */
    margin-left: 10px;
    margin-right: 45px;
  }


  .layered-paper { /* modified from: http://css-tricks.com/snippets/css/layered-paper/ */
    box-shadow:
            0px 0px 1px 1px rgba(0,0,0,0.35), /* The top layer shadow */
            5px 5px 0 0px #fff, /* The second layer */
            5px 5px 1px 1px rgba(0,0,0,0.35), /* The second layer shadow */
            10px 10px 0 0px #fff, /* The third layer */
            10px 10px 1px 1px rgba(0,0,0,0.35); /* The third layer shadow */
    margin-top: 5px;
    margin-left: 10px;
    margin-right: 30px;
    margin-bottom: 5px;
  }

  .vert-cent {
    position: relative;
      top: 50%;
      transform: translateY(-50%);
  }

  hr
  {
    border: 0;
    height: 1px;
    max-width: 1100px;
    background-image: linear-gradient(to right, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.75), rgba(0, 0, 0, 0));
  }

  #authors td {
    padding-bottom:5px;
    padding-top:30px;
  }
</style>
<!-- ======================================================================= -->

<!-- Start : Google Analytics Code -->
<!-- <script async src="https://www.googletagmanager.com/gtag/js?id=UA-64069893-4"></script>
<script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());

  gtag('config', 'UA-64069893-4');
</script> -->
<!-- End : Google Analytics Code -->

<script type="text/javascript" src="resources/hidebib.js"></script>
<link href='https://fonts.googleapis.com/css?family=Titillium+Web:400,600,400italic,600italic,300,300italic' rel='stylesheet' type='text/css'>
<head>
<div max-width=100%>
  <meta charset="utf-8" />
  <meta http-equiv="X-UA-Compatible" content="IE=edge" />
  <link rel="icon" type="image/png" href="resources/clvr_icon.png">
  <title>Learning to Synthesize Programs as Interpretable and Generalizable Policies</title>
  <meta name="HandheldFriendly" content="True" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <link rel="canonical" href="https://jesbu1.github.io/" />
  <meta name="referrer" content="no-referrer-when-downgrade" />

  <meta property="og:site_name" content="Learning to Synthesize Programs as Interpretable and Generalizable Policies" />
  <meta property="og:type" content="video.other" />
  <meta property="og:title" content="Learning to Synthesize Programs as Interpretable and Generalizable Policies" />
  <meta property="og:description" content="Dweep Trivedi*, Jesse Zhang*, Shao-Hua Sun*, Joseph Lim. Learning to Synthesize Programs as Generalizable and Interpretable Policies. NeurIPS 2021." />
  <meta property="og:url" content="https://clvrai.github.io/leaps" />
  <meta property="og:image" content="https://clvrai.github.io/skild/resources/leaps.png" />  <!-- UPDATE -->
  <meta property="og:video" content="https://www.youtube.com/v/XXX" />   <!-- UPDATE -->

  <meta property="article:publisher" content="https://jesbu1.github.io/" />
  <meta name="twitter:card" content="summary_large_image" />
  <meta name="twitter:title" content="Learning to Synthesize Programs as Interpretable and Generalizable Policies" />
  <meta name="twitter:description" content="Dweep Trivedi*, Jesse Zhang*, Shao-Hua Sun*, Joseph Lim. Learning to Synthesize Programs as Generalizable and Interpretable Policies. NeurIPS 2021." />
  <meta name="twitter:url" content="https://clvrai.github.io/leaps" />
  <meta name="twitter:image" content="https://clvrai.github.io/leaps/resources/leaps.png" />   <!-- UPDATE -->
  <meta property="og:image:width" content="1024" />
  <meta property="og:image:height" content="768" />

  <script src="https://www.youtube.com/iframe_api"></script>
  <meta name="twitter:card" content="player" />
  <meta name="twitter:image" content="https://clvrai.github.io/leaps/resources/leaps.png" />   <!-- UPDATE -->
  <meta name="twitter:player" content="https://www.youtube.com/embed/XXX?rel=0&showinfo=0" />   <!-- UPDATE -->
  <meta name="twitter:player:width" content="640" />
  <meta name="twitter:player:height" content="360" />
</head>

<body>

      <br>
      <center><span style="font-size:44px;font-weight:bold;">Learning to Synthesize Programs<br/> as Interpretable and Generalizable Policies</span></center><br/>
      <div class="table-like" style="justify-content:space-evenly;max-width:800px;margin:auto;">
          <div><center><span style="font-size:30px"><a href="https://www.linkedin.com/in/dweep-trivedi/" target="_blank">Dweep Trivedi*</a></span></center>
          <!-- <center><span style="font-size:18px">USC</span></center> -->
          </div>

          <div><center><span style="font-size:30px"><a href="https://jesbu1.github.io/" target="_blank">Jesse Zhang*</a></span></center>
          <!-- <center><span style="font-size:18px">UPenn</span></center>-->          
          </div>

          <div><center><span style="font-size:30px"><a href="http://shaohua0116.github.io/" target="_blank">Shao-Hua Sun*</a></span></center>
          <!-- <center><span style="font-size:18px">UPenn</span></center>-->          
          </div>

          <div><center><span style="font-size:30px"><a href="https://www.clvrai.com/" target="_blank">Joseph Lim</a></span></center>
          <!-- <center><span style="font-size:18px">UC Berkeley</span></center> -->
          </div>
      </div>
      <table align=center width=30% style="padding-top:0px;padding-bottom:0px">
          <tr>
            <td align=center><center><span style="font-size:25px"><a href="https://www.clvrai.com/" target="_blank">CLVR Lab, University of Southern California</a></span></center></td>
          <tr/>
      </table>
      <!-- <center><span style="font-size:20px;">Conference on Robot Learning (CoRL), 2020</span></center> -->

      <div class="table-like" style="justify-content:space-evenly;max-width:500px;margin:auto;padding:5px">
        <div><center><span style="font-size:28px"><a href="https://arxiv.org/abs/2108.13643">[Paper]</a></span></center></div>  <!-- UPDATE -->
        <div><center><span style="font-size:28px"><a href="https://www.github.com/clvrai/leaps">[Code]</a></span></center> </div>   <!-- UPDATE -->
        <div><center><span style="font-size:28px"><a href="resources/LEAPS_slides.pdf">[Slides]</a></span></center> </div>   <!-- UPDATE -->
        <div><center><span style="font-size:28px"><a href="https://nips.cc/virtual/2021/poster/26528">[Talk (10 min)]</a></span></center></div>  <!-- UPDATE -->
        <!-- <div><center><span style="font-size:28px"><a href='https://youtu.be/w32twGTWvDU'>[Talk (5 min)]</a></span></center> </div> -->
      </div>

      <!-- ### VIDEO ### -->
      <!-- <center>
      <iframe width="768" height="432" max-width="100%" src="https://www.youtube.com/embed/axXx-x86IeY?autoplay=1&loop=1&playlist=axXx-x86IeY" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></center> -->
      <!-- <iframe width="768" height="432" max-width="100%" src="resources/video.m4v" frameborder="0" allowfullscreen></iframe></center> -->
      <!-- <br> -->

      <br/>
          <center><img src = "resources/leaps_teaser.jpeg" width="600px"></img></center>

      <div style="width:800px; margin:0 auto;padding:5px" align="justify">
        Recently, deep reinforcement learning (DRL) methods have achieved impressive performance on tasks in a variety of domains. However, neural network policies produced with DRL methods are not human-interpretable and often have difficulty generalizing to novel scenarios. To address these issues, prior works explore learning structured, programmatic policies. Yet, these works employ limited policy representations or require stronger supervision. Our framework instead learns to synthesize programs solely from reward signals. 
      </div>
      <br><hr>


      <!-- ################### OVERVIEW #################### -->
        <center><h1>Overview</h1></center>
        <div style="width:800px; margin:0 auto;padding:5px" align="justify">
        To address the interpretability and generalization issues of deep reinforcement learning (DRL) methods,
        we propose synthesizing programs from reward. 

        These programs are human-readable, flexible, and expressive. However, programs are difficult to synthesize purely from environment reward.
        <center><br><img src = "resources/DNN_to_LEAPS.png" width="1000px"></img><br></center>

        </div>
        <div style="width:800px; margin:0 auto;padding:5px" align="justify">
            Due to the difficulty of directly synthesizing discrete program tokens only from task reward,
            we break down the problem into two stages:
            <ol>
            <li><b>Learning program embedding stage:</b> we propose to learn
            a program embedding space by
 training a program encoder `q_\phi` that
            encodes a program as a latent program `z`, a program decoder pθ that
            decodes
 the latent program `z` back to a reconstructed program `\hat{p}`,
            and a policy `\pi` that conditions on the latent program
 `z` and acts as
            a neural program executor to produce the execution trace of the
            latent program `z`. The model
 optimizes a combination of a program
            reconstruction loss `\mathcal{L}^\text{P}`, a program behavior reconstruction loss
`\mathcal{L}^\text{R}`, and a
 latent behavior reconstruction loss `\mathcal{L}^\text{L}`. `a_1, a_2, ..., a_t` denotes actions produced by either the policy `\pi` or program
            execution. </li>
            <li> <b>Latent program search stage:</b> we use the Cross Entropy
            Method to iteratively search for the
 best candidate latent programs
            that can be decoded and executed to maximize the reward to solve
            given tasks.</li>
        </ol>
        </div><br>
        <center><img src = "resources/leaps_model.jpeg" width="1000px"></img><br></center>
        <hr>
    <!-- ################### ENVIRONMENTS #################### -->
      <div style="overflow-x: auto;">
    <table align=center width=1000px>
        <center><h1>Karel Environments</h1></center>

        <tr>
        <td style="width:20%">
          <center><h2>StairClimber</h2></center>
          <img src = "resources/karel_gifs/gt_stairClimber_10_10.gif" width="100%" autoplay muted loop></img>
        </td>
        <td style="width:3%"></td>
        <td style="width:20%">
          <center><h2>FourCorner</h2></center>
          <img src = "resources/karel_gifs/gt_fourCorners_10_10.gif" width="100%" autoplay muted loop></img>
        </td>
        <td style="width:3%"></td>
        <td style="width:20%">
          <center><h2>TopOff</h2></center>
          <img src = "resources/karel_gifs/gt_topOff_10_10.gif" width="100%" autoplay muted loop></img>
        </td>
        <td style="width:3%"></td>
        <td style="width:20%">
          <center><h2>Maze</h2></center>
          <img src = "resources/karel_gifs/gt_randomMaze_8_8.gif" width="100%" autoplay muted loop></img>
        </td>
        </tr>
        </table>
        <br>
        <table align=center width=1000px>
        <tr>
        <td style="width:15%"></td>
        <td style="width:31.5%">
          <center><h2>CleanHouse</h2></center>
          <img src = "resources/karel_gifs/gt_cleanHouse_14_22.gif" width="100%" autoplay muted loop></img>
        </td>
        <td style="width:10%"></td>
        <td style="width:20%">
          <center><h2>Harvester</h2></center>
          <img src = "resources/karel_gifs/gt_harvester_10_10.gif" width="100%" autoplay muted loop></img>
        </td>
        <td style="width:23%"></td>
        </tr>
        </table>
      </div>
        <br>

      <div style="width:800px; margin:0 auto; text-align=right" align="justify">
        We evaluate our approach on a set of sparse-reward Karel environments---commonly used in the program synthesis domain---specially designed to evaluate the performance differences between our program policies and DRL baselines.
      </div>
      <hr>


      <!-- ################### QUALITATIVE ANALYSIS #################### -->

      <div style="width:800px; margin:0 auto; text-align=center">
        <center><h1>How well does LEAPS solve the Karel tasks?</h1></center>
      </div>
      <!-- <br/> -->
    <div style="overflow-x:auto;">
        <table align=center width=900px style="table-layout: fixed" class="paddingBetweenCols">
          <tr>
            <th><center><h2></h2></center></th>
            <th><center><h2>DRL</h2></center></th>
            <th><center><h2>LEAPS</h2></center></th>
            <th><center><h2></h2></center></th>
          </tr>
          <tr>
          <td style="width:1%">
            <center><div style="font-size:25px; transform:rotate(270deg)">
            StairClimber
            </div></center>
          </td>
            <td><img src = "resources/drl_karel_gifs/stairClimber.gif" width="99.8%" autoplay muted loop></img></td>
            <td><img src = "resources/leaps_karel_gifs/pred_one_for_all_stairClimber_12_12.gif" width="100%" autoplay muted loop></img></td>
            <td>Both methods learn to successfully climb the stairs. </td>
          </tr>
          <tr>
          <td style="width:1%">
            <center><div style="font-size:25px; transform:rotate(270deg)">
            FourCorner
            </div></center>
          </td>
            <td><img src = "resources/drl_karel_gifs/fourCorner.gif" width="99.8%" autoplay muted loop></img></td>
            <td><img src = "resources/leaps_karel_gifs/pred_one_for_all_fourCorners_12_12.gif" width="100%" autoplay muted loop></img></td>
            <td>DRL only manages to place one marker while LEAPS places one in each corner.</td>
          </tr>
          <tr>
          <td style="width:1%">
            <center><div style="font-size:25px; transform:rotate(270deg)">
            TopOff
            </div></center>
          </td>
            <td><img src = "resources/drl_karel_gifs/topOff.gif" width="99.8%" autoplay muted loop></img></td>
            <td><img src = "resources/leaps_karel_gifs/pred_one_for_all_topOff_12_12.gif" width="100%" autoplay muted loop></img></td>
            <td>DRL only tops off one marker while LEAPS tops off every marker successfully.</td>
          </tr>
          <tr>
          <td style="width:1%">
            <center><div style="font-size:25px; transform:rotate(270deg)">
            Maze 
            </div></center>
          </td>
            <td><img src = "resources/drl_karel_gifs/randomMaze.gif" width="99.8%" autoplay muted loop></img></td>
            <td><img src = "resources/leaps_karel_gifs/pred_one_for_all_randomMaze_8_8.gif" width="100%" autoplay muted loop></img></td>
            <td>Both methods learn to successfully navigate the maze. </td>
          </tr>
          <tr>
          <td style="width:1%">
            <center><div style="font-size:25px; transform:rotate(270deg)">
            CleanHouse 
            </div></center>
          </td>
            <td><img src = "resources/drl_karel_gifs/cleanHouse.gif" width="99.8%" autoplay muted loop></img></td>
            <td><img src = "resources/leaps_karel_gifs/pred_one_for_all_cleanHouse_14_22.gif" width="100%" autoplay muted loop></img></td>
            <td>LEAPS is able to clean one room, while DRL doesn't learn meaningful behaviors.</td>
          </tr>
          <td style="width:1%">
            <center><div style="font-size:25px; transform:rotate(270deg)">
            Harvester
            </div></center>
          </td>
            <td><img src = "resources/drl_karel_gifs/harvester.gif" width="99.8%" autoplay muted loop></img></td>
            <td><img src = "resources/leaps_karel_gifs/pred_one_for_all_harvester_8_8.gif" width="100%" autoplay muted loop></img></td>
            <td>Both partially harvest the markers.</td>
          </tr>
        </table>
        </div>
        <br>
          <div style="width:800px; margin:0 auto;" align="justify">
          LEAPS performs well quantitatively, too. It performs the best on 5 out of 6 tasks when compared to a wide array of 
          DRL and program synthesis baselines:
        <center><br><img src="resources/LEAPS_perf_table.png" width=800px><br></center>
      </div>
      <br/><hr>


      <!-- ################### POLICY ROLLOUTS #################### -->

      <div style="width:800px; margin:0 auto; text-align=center">
        <center><h1>How much better does LEAPS generalize?</h1></center>
        <center><br><img src="resources/LEAPS_generalization_fig.png" width=800px><br></center>
          <div style="width:800px; margin:0 auto;" align="justify">
            
          </div>
        
      </div>
      <br><div style="width:800px; margin:0 auto;" align="justify">
        We test zero-shot generalization by training policies on the original, small grids and transferring them
        to much larger 100x100 instances. 
        <center><br><img src="resources/LEAPS_generalization_table.png" width=800px><br></center>
      </div></br><hr>


      <!-- ################### INTERPRETABILITY #################### -->
      <div style="width:800px; margin:0 auto; text-align=center">
        <center><h1>Are LEAPS policies interpretable?</h1></center>
      </div>
      <div style="overflow-x:auto;">
      <table align=center width=800px>
        <tr>
        <td style="width:50%">
          <video src="resources/human_interpretability.mp4" width="100%" autoplay muted loop></video>
        </td>
        <td style="width:50%">
          <img src="resources/perf_improvement.png" width="100%"></img>
        </td>
        </tr>
      </table>
    </div>
      <br><div style="width:800px; margin:0 auto;" align="justify">
      Synthesized programs are not only readable to human users but also interactable, allowing non-expert users with a basic understanding of programming to diagnose and make edits to improve their performance. To test this hypothesis, we asked people with programming experience who are unfamiliar with our DSL or Karel tasks to edit suboptimal LEAPS programs to improve performance as much as possible.
      We see a significant increase in performance in all three tasks, with an average 97.1% increase in performance with just 3 edits and an average 125% increase in performance with 5.
      </div>
      <br>



      <!-- ################### EMBEDDING SPACE #################### 
      <hr>
      <div style="width:800px; margin:0 auto; text-align=center">
        <center><h1>What does LEAPS' program embedding space look like?</h1></center>
      </div>
          <center><img src = "resources/leaps_embedding_vis.png" width="400px"></img><br></center>
      <br><div style="width:800px; margin:0 auto;" align="justify">
        We perform dimensionality reduction with PCA to embed encoded programs from the training dataset, samples drawn from a normal distribution, programs from the testing dataset, and programs reconstructed by models to a 2D space. Compared to the normal distribution, the shape of the encoded program is more "twisted", suggesting the effectiveness of the proposed latent behavior reconstruction objective.
      </div></br> -->

      <!-- ################### CEM #################### -->
      <hr>
      <div style="width:800px; margin:0 auto; text-align=center">
        <center><h1>How does CEM search?</h1></center>
      </div>
      <!-- <br/> -->
          <center><img src = "resources/cem.gif" width="400px"></img><br></center>
      <br><div style="width:800px; margin:0 auto;" align="justify">
        Here, we demonstrate the effectiveness of CEM as our program search algorithm by visualizing its search trajectory over our latent program embedding space for the StairClimber task. As you can see, over time, CEM search converges towards the ground-truth program, marked with a red star. Note that it appears to pass through the ground truth program halfway through the search; this is due to visual distortions introduced by projecting a 256-dim space to a 2D one. 
      </div></br><hr>

      <!-- ################### CITATION #################### -->
      <div style="overflow-x: auto;">
      <table align=center width=850px>
        <center><h1>Citation</h1></center>
        <tr>
        <td width=100%>
        <pre><code style="display:block; white-space:pre-wrap">
    @inproceedings{trivedi2021leaps,
        author={Dweep Trivedi and Jesse Zhang and Shao-Hua Sun and Joseph J. Lim},
        booktitle = {Advances in Neural Information Processing Systems},
        title={Learning to Synthesize Programs as Interpretable and Generalizable Policies}, 
        url = {https://arxiv.org/pdf/2108.13643.pdf},
        volume = {34},
        year = {2021}
    }
        </code></pre>
          </td>
          </tr>
      </table>
    </div>
    <br><hr>


      <!-- <div style="width:800px; margin:0 auto; text-align=center">
        <br>
        <center>Code and full paper to be released soon.</center>
      </div> -->
      </table>

<script xml:space="preserve" language="JavaScript">
hideallbibs();
</script>
</div>
</body>
</html>
