<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8">
  <meta name="description"
        content="OMNINAV: A UNIFIED FRAMEWORK FOR PROSPEC-
TIVE EXPLORATION AND VISUAL-LANGUAGE NAVI-
GATION">
  <meta name="keywords" content="UNIFIED, activate exploration">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>OMNINAV: A UNIFIED FRAMEWORK FOR PROSPEC-
TIVE EXPLORATION AND VISUAL-LANGUAGE NAVI-
GATION</title>
  <style>
    .item video {
      width: 224; 
      height: 224;
      object-fit: cover; 
    }
  </style>
  <style>
    .container_txt {
        display: flex;
        justify-content: center; 
        width: 100%; 
    }
    .container_txt span:not(:last-child) {
        margin-right: 220px; 
    }
    .container_txt span:first-child {
        margin-right: 220px; 
      }        
</style>
  <style>
    .video-row {
      display: flex;
      justify-content: space-between;
      margin-bottom: 0px; 

    }
    
    .video-row video {
      width: calc(33% - 10px); 
      height: 200px; 
      object-fit: cover;
    }
  
    .caption {
      text-align: center;
      margin-top: 5px; 
    }
  </style>
  <!-- Global site tag (gtag.js) - Google Analytics -->
  <script async src="https://www.googletagmanager.com/gtag/js?id=G-PYVRSFMDRL"></script>
  <script>
    window.dataLayer = window.dataLayer || [];

    function gtag() {
      dataLayer.push(arguments);
    }

    gtag('js', new Date());

    gtag('config', 'G-PYVRSFMDRL');
  </script>

  <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
        rel="stylesheet">

  <link rel="stylesheet" href="./static/css/bulma.min.css">
  <link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
  <link rel="stylesheet" href="./static/css/bulma-slider.min.css">
  <link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
  <link rel="stylesheet"
        href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
  <link rel="stylesheet" href="./static/css/index.css">
  <link rel="icon" href="./static/images/favicon.svg">

  <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
  <script defer src="./static/js/fontawesome.all.min.js"></script>
  <script src="./static/js/bulma-carousel.min.js"></script>
  <script src="./static/js/bulma-slider.min.js"></script>
  <script src="./static/js/index.js"></script>
</head>
<body>

<nav class="navbar" role="navigation" aria-label="main navigation">
  <div class="navbar-brand">
    <a role="button" class="navbar-burger" aria-label="menu" aria-expanded="false">
      <span aria-hidden="true"></span>
      <span aria-hidden="true"></span>
      <span aria-hidden="true"></span>
    </a>
  </div>
</nav>


<section class="hero">
  <div class="hero-body">
    <div class="container is-max-desktop">
      <div class="columns is-centered">
        <div class="column has-text-centered">
          <h1 class="title is-1 publication-title">OMNINAV: A UNIFIED FRAMEWORK FOR PROSPEC-
TIVE EXPLORATION AND VISUAL-LANGUAGE NAVI-
GATION</h1>
          </div>
        </div>
      </div>
    </div>
  </div>
</section>

<section class="hero teaser">
  <div class="container is-max-desktop">
    <div class="hero-body">
    <img src="./static/images/structure.jpg" >
      <div class="container_txt">
        <!-- <span>Input Image</span>  -->
      </div>
    </div>
  </div>
</section>



<section class="section">
  <div class="container is-max-desktop">
    <!-- Abstract. -->
    <div class="columns is-centered has-text-centered">
      <div class="column is-four-fifths">
        <h2 class="title is-3">Abstract</h2>
        <div class="content has-text-justified">
          <p>
Embodied navigation is a foundational challenge for intelligent robots, demanding the ability to comprehend visual environments, follow natural language instructions, and explore autonomously. However, existing models struggle to provide a unified solution across heterogeneous navigation paradigms, often yielding low success rates and limited generalization. We present OmniNav, a unified framework that handles instruct-goal, object-goal, point-goal navigation, and frontier-based exploration within a single architecture. First, we introduce a lightweight, low-latency policy that predicts continuous-space waypoints (coordinates and orientations) with high accuracy, outperforming action-chunk methods in precision and supporting real-world deployment with control frequencies up to 5 Hz. Second, at the architectural level, OmniNav proposes a fast-slow system design: a fast module performs waypoint generation from relatively short-horizon visual context and subtasks, while a slow module conducts deliberative planning using long-horizon observations and candidate frontiers to select the next subgoal and subtask. This collaboration improves path efficiency and maintains trajectory coherence in exploration and memory-intensive settings. Notably, we find that the primary bottleneck lies not in navigation policy learning per se, but in robust understanding of general instructions and objects. To enhance generalization, we incorporate large-scale general-purpose training dataset including those used for image captioning and visual into a joint multi-task regimen, which substantially boosts success rates and robustness. Extensive experiments demonstrate state-of-the-art performance across diverse navigation benchmarks, and real-world deployment further validates the approach. OmniNav offers practical insights for embodied navigation and points to a scalable path toward versatile, highly generalizable robotic intelligence.          </p>
        </div>
      </div>
    </div>

    
  </div>
</section>


<section class="section">
  
 
  <div class="container is-max-desktop">
    <div class="columns is-centered">
      <div class="column is-full-width">
        <!-- <h2 class="title is-3">Animation</h2> -->
        <!-- <h3 class="title is-4">Qualitative results.</h3> -->
        <img src="./static/images/think_bev.jpg" >
        <!-- Interpolating. -->
        <p>
          <b>Cot reasoning by the slow thinking system for exploration. For the “find the bathtub”
task, the model reasons over the frontier set using memory and semantic priors (e.g., bathrooms are
more likely near bedrooms and away from dining areas), iteratively generating subgoals for the fast
system to execute.
        </p> 
        
      </div>
    </div>
    <h2 class="title is-3" >Real-World Deployment</h2>

    <div class="columns is-centered">
      <div class="column is-full-width has-text-centered">
        <video autoplay controls muted loop style="width: 100%; max-width: 960px; margin: auto; border-radius: 10px;">
          <source src="./static/process_videos/mergevideo.mp4" type="video/mp4">
        </video>
        <p style="text-align: center; margin-top: 10px; margin-bottom: 30px;">
         multiview video
        </p>
      </div>
    </div>


    <div class="columns is-centered">
      

    
      <div class="column">
        <div class="columns is-centered">
          <div class="column content">
            <video id="video" autoplay controls muted loop width="500" height="300">
              <source src="./static/process_videos/1.mp4"
                      type="video/mp4">
            </video>
            <p style="text-align: center; margin-bottom: 20px; height: 40px; line-height: 20px;">obj_goal:Find a girl wearing a pink T-shirt</p>
            <video id="video" autoplay controls muted loop width="500" height="300">
              <source src="./static/process_videos/2.mp4"
                      type="video/mp4">
            </video>
            <p style="text-align: center; margin-bottom: 20px; height: 40px; line-height: 20px;">obj-goal_long:Get out of the room and find me a water dispenser.</p>
            
            <video id="video" autoplay controls muted loop width="500" height="300">
              <source src="./static/process_videos/4.mp4"
                      type="video/mp4">
            </video>
            <p style="text-align: center; margin-bottom: 20px; height: 40px; line-height: 20px;">obj-goal_short:I want to take out the trash (find a trash can)</p>
          
            
            <video id="video" autoplay controls muted loop width="500" height="300">
              <source src="./static/process_videos/3.mp4"
                      type="video/mp4">
            </video>
            <p style="text-align: center; margin-bottom: 20px; height: 40px; line-height: 20px;">instruct-goal:Go into the first room on the left and find a chair</p>
</div>
        </div>
      </div>
      <!-- new -->
      <div class="column">
        <div class="columns is-centered">
          <div class="column content">
            <video id="video" autoplay controls muted loop width="500" height="290">
              <source src="./static/process_videos/5.mp4"
                      type="video/mp4">
            </video>
            <p style="text-align: center; margin-bottom: 20px; height: 40px; line-height: 20px;">obj-goal:find a trash can</p>
            <video id="video" autoplay controls muted loop width="500" height="300">
              <source src="./static/process_videos/7.mp4"
                      type="video/mp4">
            </video>
            <p style="text-align: center; margin-bottom: 20px; height: 40px; line-height: 20px;">obj-goal:Find a sofa</p>
            <video id="video" autoplay controls muted loop width="500" height="300">
              <source src="./static/process_videos/6.mp4"
                      type="video/mp4">
            </video>
            <p style="text-align: center; margin-bottom: 20px; height: 40px; line-height: 20px;">instruct-goal:Go forward to the first intersection, then turn left and find a trash can and park in front of it</p>
           
                      <video id="video" autoplay controls muted loop width="500" height="300">
              <source src="./static/process_videos/11.mp4"
                      type="video/mp4">
            </video>
            <p style="text-align: center; margin-bottom: 20px; height: 40px; line-height: 20px;">point-goal: Avoid sofas and people</p>
          </div>
          
        </div>
      </div>

      <div class="column">
        <div class="columns is-centered">
          <div class="column content">
            <video id="video" autoplay controls muted loop width="500" height="300">
              <source src="./static/process_videos/9.mp4"
                      type="video/mp4">
            </video>
            <p style="text-align: center; margin-bottom: 20px; height: 40px; line-height: 20px;">obj-goal:Find a vending machine</p>
            <video id="video" autoplay controls muted loop width="500" height="300">
              <source src="./static/process_videos/10.mp4"
                      type="video/mp4">
            </video>
            <p style="text-align: center; margin-bottom: 20px; height: 40px; line-height: 20px;">obj-goal: Avoid chairs in narrow spaces</p>

            <video id="video" autoplay controls muted loop width="500" height="300">
              <source src="./static/process_videos/8.mp4"
                      type="video/mp4">
            </video>
            <p style="text-align: center; margin-bottom: 20px; height: 40px; line-height: 20px;">instruct-goal:Go into the first room on the right and then find a boy. Stop in front of him.</p>
          

            <video id="video" autoplay controls muted loop width="500" height="300">
              <source src="./static/process_videos/12.mp4"
                      type="video/mp4">
            </video>
            <p style="text-align: center; margin-bottom: 20px; height: 40px; line-height: 20px;">point-goal:Local obstacle avoidance</p>
          </div>
        </div>
      </div>

    </div>
    <div class="columns is-centered" style="margin-top: 50px;">
      <div class="column is-full-width">
        <h2 class="title is-3 has-text-centered">Performance of the Slow System in Simulation</h2>
      </div>
    </div>

    <div class="columns is-centered">
      <div class="column">
        <video autoplay controls muted loop style="width: 100%; border-radius: 10px;">
          <source src="./static/process_videos/instance1.mp4" type="video/mp4">
        </video>
        <p class="has-text-centered" style="margin-top: 5px;">Explore the area until you locate a dishwasher. Stop when you've reached its location</p>
      </div>

      <div class="column">
        <video autoplay controls muted loop style="width: 100%; border-radius: 10px;">
          <source src="./static/process_videos/instance2.mp4" type="video/mp4">
        </video>
        <p class="has-text-centered" style="margin-top: 5px;">Could you help me find a plant? Show me the way</p>
      </div>
    </div>
  
    <!-- Animation. -->
    <div class="columns is-centered">
      <div class="column is-full-width">
        <h2 class="title is-4">Qualitative results between baselines and our approach</h2>
        <img src="./static/images/result1.jpg" alt="woord">
         <img src="./static/images/result2.jpg" alt="woord">
        <div class="content has-text-justified">
          <!-- <p >
            animation ablation study
          </p> -->
        </div>


        <!-- <div class="video-grid">
          <div class="video-row">

            <video src="./static/ablation_study/GEN2/00986_00000_gt_converted.mp4" controls></video>
            <video src="./static/ablation_study/GEN2/00991_00000_gt_converted.mp4" controls></video>
            <video src="./static/ablation_study/GEN2/00994_00000_gt_converted.mp4" controls></video>
          </div>
          <p style="text-align: center; margin-top: 0px; margin-bottom: 20px;">
            GEN-2
          </p>
          <div class="video-row">

            <video src="./static/ablation_study/pika/00986_00000_gt_converted.mp4" controls></video>
            <video src="./static/ablation_study/pika/00991_00000_gt_converted.mp4" controls></video>
            <video src="./static/ablation_study/pika/00994_00000_gt_converted.mp4" controls></video>
          </div>
          <p style="text-align: center; margin-top: 0px; margin-bottom: 20px;">
            Pika Labs
          </p>

          <div class="video-row">

            <video src="./static/ablation_study/SLR_SFS/00986_00000_gt_converted.mp4" controls></video>
            <video src="./static/ablation_study/SLR_SFS/00991_00000_gt_converted.mp4" controls></video>
            <video src="./static/ablation_study/SLR_SFS/00994_00000_gt_converted.mp4" controls></video>
          </div>
          <p style="text-align: center; margin-top: 0px; margin-bottom: 20px;">
            SFS
          </p>

          <div class="video-row">

            <video src="./static/ablation_study/animate_landscape/00986_00000_gt_converted_first25frames.mp4" controls></video>
            <video src="./static/ablation_study/animate_landscape/00991_00000_gt_converted_first25frames.mp4" controls></video>
            <video src="./static/ablation_study/animate_landscape/00994_00000_gt_converted_first25frames.mp4" controls></video>
          </div>
          <p style="text-align: center; margin-top: 0px; margin-bottom: 20px;">
            Animating-landscape
          </p>

          <div class="video-row">

            <video src="./static/ablation_study/ground_truth/00986_00000_gt_converted.mp4" controls></video>
            <video src="./static/ablation_study/ground_truth/00991_00000_gt_converted.mp4" controls></video>
            <video src="./static/ablation_study/ground_truth/00994_00000_gt_converted.mp4" controls></video>
          </div>
          <p style="text-align: center; margin-top: 0px; margin-bottom: 20px;">
            Ground-truth
          </p>


          <div class="video-row">

            <video src="./static/ablation_study/genmo/00986_00000_gt_converted.mp4" controls></video>
            <video src="./static/ablation_study/genmo/00991_00000_gt_converted.mp4" controls></video>
            <video src="./static/ablation_study/genmo/00994_00000_gt_converted.mp4" controls></video>
          </div>
          <p style="text-align: center; margin-top: 0px; margin-bottom: 20px;">
            Genmo
          </p>

          <div class="video-row">

            <video src="./static/ablation_study/i2vgen/00986_00000_gt_converted.mp4" controls></video>
            <video src="./static/ablation_study/i2vgen/00991_00000_gt_converted.mp4" controls></video>
            <video src="./static/ablation_study/i2vgen/00994_00000_gt_converted.mp4" controls></video>
          </div>
          <p style="text-align: center; margin-top: 0px; margin-bottom: 20px;">
            I2VGen-XL
          </p>

          <div class="video-row">

            <video src="./static/ablation_study/modified_holynski_baseline_GT/00986_00000_gt_converted.mp4" controls></video>
            <video src="./static/ablation_study/modified_holynski_baseline_GT/00991_00000_gt_converted.mp4" controls></video>
            <video src="./static/ablation_study/modified_holynski_baseline_GT/00994_00000_gt_converted.mp4" controls></video>
          </div>
          <p style="text-align: center; margin-top: 0px; margin-bottom: 20px;">
            Animating-pictures
          </p>

          <div class="video-row">

            <video src="./static/ablation_study/anymate-anything-8/00986_00000_gt_converted.mp4" controls></video>
            <video src="./static/ablation_study/anymate-anything-8/00991_00000_gt_converted.mp4" controls></video>
            <video src="./static/ablation_study/anymate-anything-8/00994_00000_gt_converted.mp4" controls></video>
          </div>
          <p style="text-align: center; margin-top: 0px; margin-bottom: 20px;">
            Animate-anything
          </p>

          <div class="video-row">

            <video src="./static/ablation_study/water_8100/00986_00000_gt_converted.mp4" controls></video>
            <video src="./static/ablation_study/water_8100/00991_00000_gt_converted.mp4" controls></video>
            <video src="./static/ablation_study/water_8100/00994_00000_gt_converted.mp4" controls></video>
          </div>
          <p style="text-align: center; margin-top: 0px; margin-bottom: 20px;">
            Ours
          </p>

        </div>

        

 -->



        

        <br/>
 
        <!-- <h3 class="title is-4">Longer Video Generation</h3>
        <div class="content has-text-justified"> -->


        <!-- <section class="hero teaser">
          <div class=" is-centered container is-max-desktop">
            <div class="has-text-centered">
              <video id="longvideo1" autoplay muted loop playsinline controls>
                <source src="./static/long_video/14_ours_converted_resize.mp4"
                        type="video/mp4">
              </video>
              <video id="longvideo2" autoplay muted loop playsinline controls>
                <source src="./static/long_video/16_ours_converted_resize.mp4"
                        type="video/mp4">
              </video>
              <video id="longvideo3" autoplay muted loop playsinline controls>
                <source src="./static/long_video/no_skip80_1_converted_resize.mp4"
                        type="video/mp4">
              </video>
            </div>
          </div>
        </section> -->
        </div>


      </div>
    </div>





  </div>
</section>




<!-- <footer class="footer">
  <div class="container">
    <div class="content has-text-centered">
    </div>
    <div class="columns is-centered">
      <div class="column is-8">
        <div class="content">
          <p>
            This website is borrowed from <a
              href="https://github.com/nerfies/nerfies.github.io">source code</a> 
          </p>
        </div>
      </div>
    </div>
  </div>
</footer> -->

</body>
</html>
