<!DOCTYPE html>
<html>
<head>
    <meta charset="utf-8">
    <meta name="description"
          content="The .">
    <meta name="keywords" content="Tree-guided Diffusion Planner">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>Tree-guided Diffusion Planner</title>

    <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
          rel="stylesheet">

    <link rel="stylesheet" href="./static/css/bulma.min.css">
    <link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
    <link rel="stylesheet" href="./static/css/bulma-slider.min.css">
    <link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
    <link rel="stylesheet"
          href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
    <link rel="stylesheet" href="./static/css/index.css">
    <link rel="icon" href="./static/images/favicon.svg">

    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
    <script defer src="./static/js/fontawesome.all.min.js"></script>
    <script src="./static/js/bulma-carousel.min.js"></script>
    <script src="./static/js/bulma-slider.min.js"></script>
    <script src="./static/js/index.js"></script>
    <!-- Load MathJax -->
    <script type="text/javascript" async
        src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js">
    </script>
    
    <!-- MathJax Configuration -->
    <script type="text/javascript">
        MathJax = {
            tex: {
                inlineMath: [['$', '$']],
                displayMath: [['$$', '$$']],
            }
        };
    </script>

</head>
<body>

<!--<nav class="navbar" role="navigation" aria-label="main navigation">-->
<!--    <div class="navbar-brand">-->
<!--        <a role="button" class="navbar-burger" aria-label="menu" aria-expanded="false">-->
<!--            <span aria-hidden="true"></span>-->
<!--            <span aria-hidden="true"></span>-->
<!--            <span aria-hidden="true"></span>-->
<!--        </a>-->
<!--    </div>-->
<!--</nav>-->

<section class="hero is-dark">
    <div class="hero-body">
      <div class="container is-max-desktop" style="max-width: 1200px;">
        <div class="columns is-centered">
          <div class="column has-text-centered">
                    <h1 class="title is-3 publication-title">
                        Tree-guided Diffusion Planner <img src="https://emojicdn.elk.sh/🌴"  style="height: 1em;"  />
                    </h1>
                      <div class="is-size-5 publication-authors">
                        <span class="author-block">
                          Anonymous authors
                        </span>
                    </div>
                </div>
            </div>
        </div>
    </div>
</section>

<br>

<section class="hero teaser">
    <div class="container is-max-desktop" style="max-width: 1200px;">
        <div class="hero">
            <br>
            <h2 class="subtitle">
                <strong>Tree-guided Diffusion Planner (TDP)</strong> <img src="https://emojicdn.elk.sh/🌴"  style="height: 1em;"  /> is a flexible training-free test-time planning framework that balances exploration and exploitation through structured trajectory generation. 
                It addresses the limitations of gradient-based guidance by exploring diverse trajectory regions and harnessing gradient information across the expanded solution space.
            </h2>
            <div style="text-align: center;">
                <img src="./static/images/overview_method.png" class="interpolation-image" alt="header-image." width="45%"/>
            </div>
            
            <br>
            <h2 class="subtitle">
                <strong>(1) Parent Branching</strong>: diverse parent trajectories are produced via fixed-potential particle guidance <a href="#reference" style="color: gray;">[1]</a> to encourage broad exploration.
                <br>
                <br>
                <strong>(2) Sub-Tree Expansion</strong>: sub-trajectories are locally refined through fast conditional denoising guided by task objectives.
                <br>
                <br>
                <strong>TDP</strong> <img src="https://emojicdn.elk.sh/🌴"  style="height: 1em;"  /> consistently outperforms state-of-the-art planning approaches across a wide range of guidance functions, involving <u><a href="#pnwp" style="color: red;">non-convex</a></u> objectives, <u><a href="#maze2d-gold-picking" style="color: red;">non-differentiable constraints</a></u>, and <u><a href="#multi-reward" style="color: red;">multi-reward</a></u> structures. 

            <br>
            <br>
            <br>
        </div>
    </div>
</section>

<!-- <section class="section hero is-light">
    <div class="container is-max-desktop" style="max-width: 1200px;">
        <div class="columns is-centered">
            <div class="column is-full-width">
                <h2 class="title is-3 has-text-centered">Problem Setting</h2>
                <div class="content has-text-justified">

                    <div class="content has-text-justified">
                        <p>
                            We consider test-time reward maximization problem with the pretrained planner model where the agent has access to the user-defined guide function $\mathcal{J}(\boldsymbol{\tau})$ which indicates the fitness of the generated trajectory $\boldsymbol{\tau}$. 
                            As per-timestep reward does not guarantee the optimality of a low-level action (e.g., non-convex reward landscope), planning capability based on exploration is required to find the optimal trajectory $\hat{\boldsymbol{\tau}}$ that maximizes $\mathcal{J}$. 
                            The agent must find an action sequence that maximizes the guide score within a limited number of steps: 

                        </p>
                    </div>
                   $$ 
                   \begin{equation}
                        \hat{\boldsymbol{\tau}} = \hat{\boldsymbol{a}}_{1:\hat{T}} = \arg\max_{T,\; \boldsymbol{a}_{1:T}} \; \mathcal{J} (\boldsymbol{s}_0, \boldsymbol{a}_{1:T}) \quad \text{subject to} \quad T_{\text{pred}} \leq T \leq T_{\text{max}}
                    \label{problem_setting}
                    \end{equation}
                    $$
                    
                    <div class="content has-text-justified">
                        <p>
                            Planning horizon $T_{\text{pred}}$ is determined by the choice of planner model. 
                            Model-free RL methods with single step execution predict a single action at each timestep so $T_{\text{pred}}=1$, whereas diffusion planner predicts a sequence of actions $\boldsymbol{a}_{1:T_{\text{pred}}}$ at once. 

                        </p>
                        <p>
                            The standard approach to guide diffusion planning in test time is to use naive gradient guidance, which progressively refines the denoising process by combining the score estimate from the unconditional diffusion model with the auxiliary guide function. 
                            It approximates the reverse denoising process as Gaussian with small perturbation if the guidance distribution $h(\boldsymbol{\tau}_i)$ is sufficiently smooth and the gradient of the guide function is time-independent: 
                        </p>
                    </div>

                    $$
                    \tilde{p}(\boldsymbol{\tau}_{i-1}|\boldsymbol{\tau}_i) \propto p_\theta(\boldsymbol{\tau}_{i-1}|\boldsymbol{\tau}_i)h(\boldsymbol{\tau}_i) \approx \mathcal{N}(\boldsymbol{\tau}_{i-1}; \mu+\alpha\Sigma g, \Sigma)
                    $$

                    <div class="content has-text-justified">
                        <p>
                            where $g=\nabla_\tau \log h(\boldsymbol{\tau}_i)$ is the gradient of the guidance distribution, $\alpha$ is guidance strength, and $\mu, \Sigma$ are the mean and covariance of the pretrained reverse denoising process <a href="#reference" style="color: red;">[2]</a>.
                        </p>
                
                </div>
                <br>

            </div>
        </div>
    </div>
</section> -->

<section class="section hero is-light">
    <div class="container is-max-desktop" style="max-width: 1200px;">
        <div class="columns is-centered">
            <div class="column is-full-width">
                <h2 class="title is-3 has-text-centered">Method</h2>
                <div class="content has-text-justified">

                    <!-- <h3 class="title is-5">State Decomposition</h3> -->
                    <!-- <p> -->
                        <!-- In many robotic planning tasks, individual states encode multiple temporal information, yet only a subset is directly evaluated by the guide function during test-time planning.  -->
                    <!-- We introduce a structural decomposition of the state into observation and control components, reflecting their distinct roles in task evaluation and physical execution. 
                    We refer to the states that are directly evaluated by the guide function as <strong><i>observation</i></strong> states. On the other hand, <strong><i>control</i></strong> states are not directly scored but are critical for ensuring feasibility and smooth transitions.  -->
                    <!-- These $\textit{control}$ states govern the underlying system dynamics and ultimately support achieving high-level objectives.  -->
                    <!-- </p> -->
                    <h3 class="title is-5">Parent Branching</h3>
                    <p>
                        <!-- In the first phase, we apply fixed-potential particle guidance (PG) <a href="#reference" style="color: red;">[1]</a> to promote diversity among generated control trajectories.  -->
                    Unlike conventional gradient-based guidance methods that pull samples toward high-reward regions, particle guidance introduces repulsive interactions by computing pairwise distances between trajectory samples, specifically over the <i>control</i> states, which encourages to push them apart in the state space. 
                    <!-- This leads to broad coverage across dynamically feasible trajectories without requiring a predefined task objective.  -->
                    A single denoising step is defined as: 
                    </p>                    
                    $$
                    \left[\boldsymbol{\mu}^{i}_{\text{control}},\; \boldsymbol{\mu}^{i}_{\text{obs}}\right] \leftarrow \boldsymbol{\mu}_{\theta}(\boldsymbol{\tau}^{i})
                    $$ 
                    $$
                    \boldsymbol{\mu}^{i}_{\text{control}} \leftarrow \boldsymbol{\mu}^{i}_{\text{control}} + \alpha_p \Sigma^i \nabla \Phi(\boldsymbol{\mu}^{i}_{\text{control}}), \quad
                    \boldsymbol{\mu}^{i}_{\text{obs}} \leftarrow \boldsymbol{\mu}^{i}_{\text{obs}} + \alpha_g \Sigma^i \nabla \mathcal{J}(\boldsymbol{\mu}^{i}_{\text{obs}})
                    $$
                    $$
                    \boldsymbol{\mu}^{i} \leftarrow \left[\boldsymbol{\mu}^{i}_{\text{control}},\; \boldsymbol{\mu}^{i}_{\text{obs}}\right]
                    $$
                    $$
                    \boldsymbol{\tau}^{i-1} \sim \mathcal{N}(\boldsymbol{\mu}^{i}, \Sigma^i)
                    $$
                    where $\boldsymbol{\mu}^i_{\text{control}}$ and $\boldsymbol{\mu}^i_{\text{obs}}$ denote the <i>control</i> and <i>observation</i> components of the predicted mean of the denoising trajectory at timestep $i$, and $(\alpha_p, \alpha_g)$ are the guidance strengths for the particle guidance and gradient guidance, respectively. 
                    <!-- Gradient-based guidance term is optionally applied to steer the <i>observation</i> states, depending on the planning strategy. 
                    Unconditional PG sampling facilitates broad exploration across the data space, making it effective for discovering diverse solutions, whereas conditional PG sampling focuses exploration toward regions aligned with the guide function.  -->

                    <h3 class="title is-5">Sub-Tree Expansion</h3>
                    <p>
                        <!-- In the second phase, we apply a fast denoising process with a reduced number of steps $N_f \ll N$, where $N$ is the original number of diffusion steps, to refine parent trajectories using gradient guidance signals.  -->
                        For each parent trajectory, a random branch site is selected, and a child trajectory is generated by denoising from a partially noised version of the parent trajectory in order to refine parent trajectories using gradient guidance signals.
                        <!-- This process is defined as:  -->
                    </p>
                    <!-- $$
                    \boldsymbol{\tau}_{\text{child}}^{N_f} \sim q_{N_f}(\boldsymbol{\tau}_{\text{parent}}, \boldsymbol{C}) \quad 
                    \text{where} \; \boldsymbol{C}=\{ \boldsymbol{s}_k \}_{k=0}^{b} \; \text{and} \; b\sim Uniform\left[0, T_{\text{pred}}\right)
                    $$ -->
                    <!-- <p>
                        where $\boldsymbol{C}$ denotes the prefix of the parent trajectory, $q_{N_f}$ is the partial forward noising distribution with $N_f$ denoising steps, and $\boldsymbol{\tau}_{\text{child}}^{N_f}$ is the partially noised trajectory from which the child trajectory is denoised during sub-tree expansion. 
                    </p> -->
                    <p>
                        A full algorithm of <strong>TDP</strong> <img src="https://emojicdn.elk.sh/🌴"  style="height: 1em;"  /> is provided in <a href="#algorithm" style="color: red;">Algorithm</a>. 
                    </p>
                </div>

            </div>
        </div>
    </div>
</section>


<section class="section hero">
    <div class="container is-max-desktop" style="max-width: 1200px;">
        <div class="columns is-centered">
            <div class="column is-full-width">
                <h2 class="title is-3 has-text-centered" id="maze2d-gold-picking">Maze2D Gold-picking</h2>
                <div class="content has-text-justified">
                    <p>
                        Maze2D gold-picking task is a planning problem with a test-time <strong><u>non-differentiable constraint</u></strong>, where the agent must generate a feasible trajectory that satisfies an initial state, a final goal state, and an intermediate goal state (the gold position <img src="./static/images/gold_coin.png" class="no-darkmode-invert" alt="gold-coin Logo" style="height: 1em; margin-left: 0.em; margin-left: 0.em">). 
                        <!-- From the planner’s perspective, an intermediate goal is interpreted as a non-differentiable constraint, since the requirement to pass through a specific state imposes a discrete structural condition that is not captured in the training distribution.  -->
                    </p>
                    <div style="text-align: center;">
                        <img src="./static/images/maze2d_gold_picking_alpha_ablation.png" class="interpolation-image"/>
                    </div>
                    <h5 class="subtitle has-text-centered">
                        Two Gold-picking tasks in Maze2D-Large <a href="#reference" style="color: gray;">[3]</a>.
                    </h5>
                    <div class="content has-text-justified">
                        <p>
                            Gradient-based guidance typically requires selecting a guidance strength $\alpha$ to balance adherence to the guide signal and trajectory fidelity. 
                            However, $\alpha$ is highly task-dependent, and exhaustive tuning across tasks introduces significant overhead during evaluation. 
                            On the Maze2D gold-picking task, the MCSS (Monte-Carlo Sampling with Selection) baseline exhibits $\alpha$-dependent performance, whereas <strong>TDP</strong> <img src="https://emojicdn.elk.sh/🌴" style="height: 1em;" /> remains robust across varying values of the guidance strength $\alpha$. 
                            $\alpha_0$ is guidance strength used in the main paper.
                        </p>
                    </div>


                </div>
                <br>

                
            </div>
        </div>
    </div>
</section>

<section class="section hero is-light">
    <div class="container is-max-desktop" style="max-width: 1200px;">
        <div class="columns is-centered">
            <div class="column is-full-width">
                <h2 class="title is-3 has-text-centered" id="pnwp">Pick-and<i>-Where-to-</i>Place ($\texttt{PnWP}$)</h2>
                <div class="content has-text-justified">
                    <div style="text-align: center;">
                        <img src="./static/images/pnwp_description.png" class="interpolation-image" width="50%"/>
                    </div>
                <h5 class="subtitle has-text-centered">
                    $\texttt{PnWP}$ with Kuka robot arm <a href="#reference" style="color: gray;">[4]</a>.
                </h5>
                    <div class="content has-text-justified">
                        <p>
                            We introduce a <strong><u>non-convex</u></strong> exploration task in robot arm manipulation enviornment.  
                            The agent must infer suitable placement location based on the reward distribution and plan pick-and-place actions. 
                            Since $x^*_{local}$ has a wide peak and $x^*_{global}$ has a narrow peak, agents easily get stuck in the local optima unless the planner sufficiently explores the trajectory space. 
                            Mono-level guided sampling methods (i.e., TAT <a href="#reference" style="color: gray;">[2]</a>, MCSS) tend to converge to local optima, often stacking all blocks at $x^*_{local}$. 
                            <!-- While typical $\texttt{PnP}$ tasks only require fitting blocks into a given target configuration, $\texttt{PnWP}$ challenges the planner to distinguish a globally optimal arrangement from a suboptimal local one. -->
                        </p>
                        <!-- <p> -->
                            <!-- however, bi-level sampling approach (i.e., <strong>TDP</strong> <img src="https://emojicdn.elk.sh/🌴"  style="height: 1em;"  />) is better able to consistently identify globally optimal placements.  -->
                            <!-- In this task, unconditional PG in the parent branching phase enables broad exploration of the trajectory space.  -->
                        <!-- </p> -->
                    </div>
                    <h4 class="title is-4" style="text-align: center;"><strong>TDP</strong> <img src="https://emojicdn.elk.sh/🌴"  style="height: 1em;"  /> is a bi-level search framework.</h4>
                    <div class="grid-container-three-no-border">
                        <div class="content has-text-justified">
                            <video
                                controls
                                muted
                                preload
                                playsinline
                                width="100%"
                                autoplay
                                loop>
                                <source src="static/videos/pnwp_tat.mp4" type="video/mp4">
                            </video>
                            <h5 class="subtitle has-text-centered">TAT<br>(Highest-weighted trajectory)</h5>
                        </div>
                        
                        <div class="content has-text-justified">
                            <video
                                controls
                                muted
                                preload
                                playsinline
                                width="100%"
                                autoplay
                                loop>
                                <source src="static/videos/pnwp_mcss.mp4" type="video/mp4">
                            </video>
                            <h5 class="subtitle has-text-centered">MCSS<br>(Highest-scoring trajectory)</h5>
                        </div>
                        
                        <div class="content has-text-justified">
                            <video
                                controls
                                muted
                                preload
                                playsinline
                                width="100%"
                                autoplay
                                loop>
                                <source src="static/videos/pnwp_tdp.mp4" type="video/mp4">
                            </video>
                            <h5 class="subtitle has-text-centered">TDP <img src="https://emojicdn.elk.sh/🌴"  style="height: 1em;"  /></h5>
                        </div>
                </div>
                <br>
            </div>
                
        </div>
    </div>
</section>

<section class="section hero">
    <div class="container is-max-desktop" style="max-width: 1200px;">
        <div class="columns is-centered">
            <div class="column is-full-width">
                <h2 class="title is-3 has-text-centered" id="multi-reward">AntMaze Multi-goal Exploration</h2>
                <div class="content has-text-justified">

                    <div style="text-align: center;">
                        <img src="./static/images/ant_exploration_description.png" class="interpolation-image"/>
                    </div>
                    <h5 class="subtitle has-text-centered">
                        Multi-goal Exploration in AntMaze-Large <a href="#reference" style="color: gray;">[3]</a>.
                    </h5>
                    <!-- <br> -->
                    <div class="content has-text-justified">
                        <p>
                            We introduce a <strong><u>multi-reward</u></strong> exploration task in AntMaze environment. 
                            <!-- A priority-aware multi-goal exploration is designed in AntMaze locomotion planning.  -->
                            The diffusion planner predicts the next 64 steps (highlighted in bright on the map) using a combined Gaussian reward signal from multiple goals. 
                            Goals must be visited in priority order, with higher-priority goals emitting stronger, narrower Gaussians. 
                        </p>
                        <p>
                            For example, as illustrated in the figure above, the first goal the agent visits is $g_2$ at $t = t_3$. If the agent subsequently visits $g_1$, $g_4$, and $g_3$ after $t=t_3$, it successfully reaches all four goals ($g_2 \rightarrow g_1 \rightarrow g_4 \rightarrow g_3$). However, some of the goal priorities are violated. 
                            Specifically, the orderings $g_2 \rightarrow g_4$, $g_2 \rightarrow g_3$, $g_1 \rightarrow g_4$, and $g_1 \rightarrow g_3$ are correct, while $g_2 \rightarrow g_1$ and $g_4 \rightarrow g_3$ violate the intended priority.
                            In this case, while the agent achieves a goal completion score of 4/4, its priority sequence match accuracy is only 4/6.
                            The agent can achieve the maximum accuracy of 6/6 only by visiting all goals in the correct prioritized order—i.e., $g_1 \rightarrow g_2 \rightarrow g_3 \rightarrow g_4$.
                        </p>
                        <!-- <p>
                            In this task, conditional PG in the parent branching phase plays a key role in improving both goal completion score and priority sequence match accuracy. 
                        </p> -->
                    </div>
                    <h4 class="title is-4" style="text-align: center;"><strong>TDP</strong> <img src="https://emojicdn.elk.sh/🌴"  style="height: 1em;"  /> achieves more goals, with higher sequence accuracy.</h4>
                    <div class="grid-container-three-no-border">
                        <div class="content has-text-justified">
                            <video
                                controls
                                muted
                                preload
                                playsinline
                                width="100%"
                                autoplay
                                loop>
                                <source src="static/videos/antmaze_mcss_1.mp4" type="video/mp4">
                            </video>
                            <h5 class="subtitle has-text-centered">MCSS<br>(X)</h5>
                        </div>
                        
                        <div class="content has-text-justified">
                            <video
                                controls
                                muted
                                preload
                                playsinline
                                width="100%"
                                autoplay
                                loop>
                                <source src="static/videos/antmaze_mcss_2.mp4" type="video/mp4">
                            </video>
                            <h5 class="subtitle has-text-centered">MCSS<br>($g_2 \rightarrow g_3 \rightarrow $ X)</h5>
                        </div>
                        
                        <div class="content has-text-justified">
                            <video
                                controls
                                muted
                                preload
                                playsinline
                                width="100%"
                                autoplay
                                loop>
                                <source src="static/videos/antmaze_tdp_1.mp4" type="video/mp4">
                            </video>
                            <h5 class="subtitle has-text-centered">TDP <img src="https://emojicdn.elk.sh/🌴"  style="height: 1em;"  /><br>($g_1 \rightarrow g_2 \rightarrow g_3 \rightarrow g_4$)</h5>
                        </div>
                    </div>

                    <h4 class="title is-4" style="text-align: center;"><strong>TDP</strong> <img src="https://emojicdn.elk.sh/🌴"  style="height: 1em;"  /> completes tasks in fewer timesteps.</h4>
                    <div class="grid-container-two-no-border-resize">
                        <div class="column">
                            <div class="content has-text-justified">
                              <video id="video1" muted preload="auto" playsinline width="100%">
                                <source src="static/videos/antmaze_mcss_3.mp4" type="video/mp4">
                              </video>
                              <h5 class="subtitle has-text-centered">MCSS<br>($g_1 \rightarrow g_4 \rightarrow g_3 \rightarrow g_2$, slow)</h5>
                            </div>
                          </div>
                        
                          <div class="column">
                            <div class="content has-text-justified">
                              <video id="video2" muted preload="auto" playsinline width="100%">
                                <source src="static/videos/antmaze_tdp_2.mp4" type="video/mp4">
                              </video>
                              <h5 class="subtitle has-text-centered"><strong>TDP</strong> <img src="https://emojicdn.elk.sh/🌴" style="height: 1em;" /><br>($g_1 \rightarrow g_4 \rightarrow g_3 \rightarrow g_2$, fast)</h5>
                            </div>
                          </div>
                    </div>
                    <p>
                        In these videos, <strong>COMPLETE</strong> indicates that the agent has successfully visited all four goals, and <strong>SUCCESS</strong> indicates that the agent has successfully visited all four goals in the <u>correct order</u>.
                    </p>
                </div>
                <!-- <br> -->
            </div>
                
        </div>
    </div>
</section>

<section class="section hero is-light">
    <div class="container is-max-desktop" style="max-width: 1200px;">
        <div class="columns is-centered">
            <div class="column is-full-width">
                <h2 class="title is-3 has-text-centered" id="algorithm">Full Algorithm</h2>
                <div class="content has-text-justified">

                    <div style="text-align: center;">
                        <img src="./static/images/algorithm.png" class="interpolation-image" width="75%"/>
                    </div>
                </div>
            </div>
        </div>
    </div>
</section>

<section class="section hero">
    <div class="container is-max-desktop" style="max-width: 1200px;">
        <div class="columns is-centered">
            <div class="column is-full-width">
                <h2 class="title is-3 has-text-centered" id="reference">Reference</h2>
                <div class="content has-text-justified">

                    <div class="content has-text-justified">
                        <p>
                            [1]: Gabriele Corso, Yilun Xu, Valentin De Bortoli, Regina Barzilay, and Tommi S. Jaakkola. Particle guidance: non-i.i.d. diverse sampling with diffusion models. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=KqbCvIFBY7.
                            
                        </p>
                        <!-- <p>
                            [2]: Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics, 2015. URL https://arxiv.org/abs/1503.03585.
                        </p> -->
                        <p>
                            [2]: Lang Feng, Pengjie Gu, Bo An, and Gang Pan. Resisting stochastic risks in diffusion planners with the trajectory aggregation tree, 2024. URL https://arxiv.org/abs/2405.17879.
                        </p>
                        <p>
                            [3]: Justin Fu, Aviral Kumar, Ofir Nachum, George Tucker, and Sergey Levine. D4rl: Datasets for deep data-driven reinforcement learning, 2021. URL https://arxiv.org/abs/2004.07219.
                        </p>
                        <p>
                            [4]: Caelan Reed Garrett, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning, 2020. URL https://arxiv.org/abs/4181802.08705.
                        </p>
                    </div>


                </div>
                <br>
            </div>
                
        </div>
    </div>
</section>

<footer class="footer" style="background-color: #f5f5f5">
    <div class="container">
        <div class="columns is-centered">
            <div class="column is-8">
                <div class="content">
                    <p>
                        This website based on the <a href="https://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA
                        4.0</a> licensed
                        <a rel="template" href="https://github.com/nerfies/nerfies.github.io">Nerfies website</a>.
                    </p>
                </div>
            </div>
        </div>
    </div>
</footer>

<script>
    const video1 = document.getElementById("video1");
    const video2 = document.getElementById("video2");
  
    let video1Ended = false;
    let video2Ended = false;
  
    function tryReplay() {
      if (video1Ended && video2Ended) {
        video1.currentTime = 0;
        video2.currentTime = 0;
        video1.play();
        video2.play();
        video1Ended = false;
        video2Ended = false;
      }
    }
  
    video1.addEventListener("ended", () => {
      video1Ended = true;
      tryReplay();
    });
  
    video2.addEventListener("ended", () => {
      video2Ended = true;
      tryReplay();
    });
  
    // Autoplay initially
    video1.play();
    video2.play();
</script>

<!-- <script>
    const pnwp_video1 = document.getElementById("pnwp_video1");
    const pnwp_video2 = document.getElementById("pnwp_video2");
  
    let pnwp_video1Ended = false;
    let pnwp_video2Ended = false;
  
    function tryReplay() {
      if (pnwp_video1Ended && pnwp_video2Ended) {
        pnwp_video1.currentTime = 0;
        pnwp_video2.currentTime = 0;
        pnwp_video1.play();
        pnwp_video2.play();
        pnwp_video1Ended = false;
        pnwp_video2Ended = false;
      }
    }
  
    pnwp_video1.addEventListener("ended", () => {
        pnwp_video1Ended = true;
      tryReplay();
    });
  
    pnwp_video2.addEventListener("ended", () => {
        pnwp_video2Ended = true;
      tryReplay();
    });
  
    // Autoplay initially
    pnwp_video1.play();
    pnwp_video2.play();
</script> -->

</body>
</html>



