<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Policy Demonstrations</title>
  <style>
    body {
      font-family: Arial, sans-serif;
      max-width: 900px;
      margin: 0 auto;
      padding: 20px;
      line-height: 1.6;
      color: #333;
    }
    h1, h2, h3 {
      color: #222;
    }
    .video-section {
      margin-bottom: 40px;
    }
    .video-pair {
      display: flex;
      justify-content: space-between;
      gap: 4%;
      margin-bottom: 20px;
    }
    .video-item {
      display: flex;
      flex-direction: column;
      width: 48%;
    }
    .video-item video {
      width: 100%;
      border: 1px solid #ccc;
      border-radius: 4px;
    }
    .video-item p {
      margin: 8px 0 0;
      font-size: 0.95em;
      text-align: center;
    }
    .task-desc {
      margin-bottom: 30px;
    }
    .task-desc h3 {
      margin-top: 20px;
    }
    .task-desc p {
      margin: 4px 0;
    }
  </style>
</head>
<body>

  <h1>Demonstrations of Uncertainty-Sensitive Privileged Learning (USPL)</h1>

  <!-- PP-DP Behavior Divergence Section -->
  <section class="video-section">
    <h2>1. PP-DP Behavior Divergence</h2>
    <p>This section compares the Deployment Policy (DP) and Privileged Policy (PP) behaviors. Each row shows the baseline algorithm on the left and our method on the right. Except for Blind-Mass Stack, the transparent robot represents the PP, and the other robot represents the DP.</p>

    <!-- Franka -->
    <div class="video-pair">
      <div class="video-item">
        <video controls src="PP-DP-Behavior-Divergence/RMA/franka.mp4"></video>
        <p>Blind-Mass Stack (RMA). Left: DP, Right: PP</p>
      </div>
      <div class="video-item">
        <video controls src="PP-DP-Behavior-Divergence/USPL/franka.mp4"></video>
        <p>Blind-Mass Stack (USPL). Left: DP, Right: PP</p>
      </div>
    </div>

    <!-- Lateral Choice -->
    <div class="video-pair">
      <div class="video-item">
        <video controls src="PP-DP-Behavior-Divergence/RMA/lateral choice.mp4"></video>
        <p>Lateral Choice (RMA)</p>
      </div>
      <div class="video-item">
        <video controls src="PP-DP-Behavior-Divergence/USPL/lateral choice.mp4"></video>
        <p>Lateral Choice (USPL)</p>
      </div>
    </div>

    <!-- Midpoint Choice -->
    <div class="video-pair">
      <div class="video-item">
        <video controls src="PP-DP-Behavior-Divergence/RMA/midpoint choice.mp4"></video>
        <p>Midpoint Choice (RMA)</p>
      </div>
      <div class="video-item">
        <video controls src="PP-DP-Behavior-Divergence/USPL/midpoint choice.mp4"></video>
        <p>Midpoint Choice (USPL)</p>
      </div>
    </div>

    <!-- Quadrotor -->
    <div class="video-pair">
      <div class="video-item">
        <video controls src="PP-DP-Behavior-Divergence/RMA/quadrotor.mp4"></video>
        <p>Biased Quadrotor (RMA)</p>
      </div>
      <div class="video-item">
        <video controls src="PP-DP-Behavior-Divergence/USPL/quadrotor.mp4"></video>
        <p>Biased Quadrotor (USPL)</p>
      </div>
    </div>

    <!-- Signpost Nav -->
    <div class="video-pair">
      <div class="video-item">
        <video controls src="PP-DP-Behavior-Divergence/RMA/signpost nav.mp4"></video>
        <p>Signpost Nav (RMA)</p>
      </div>
      <div class="video-item">
        <video controls src="PP-DP-Behavior-Divergence/USPL/signpost nav.mp4"></video>
        <p>Signpost Nav (USPL)</p>
      </div>
    </div>

    <!-- Square Maze -->
    <div class="video-pair">
      <div class="video-item">
        <video controls src="PP-DP-Behavior-Divergence/RMA/square maze.mp4"></video>
        <p>Square Maze (RMA)</p>
      </div>
      <div class="video-item">
        <video controls src="PP-DP-Behavior-Divergence/USPL/square maze.mp4"></video>
        <p>Square Maze (USPL)</p>
      </div>
    </div>

    <!-- Stairway Search -->
    <div class="video-pair">
      <div class="video-item">
        <video controls src="PP-DP-Behavior-Divergence/RMA/stairway search.mp4"></video>
        <p>Stairway Search (RMA)</p>
      </div>
      <div class="video-item">
        <video controls src="PP-DP-Behavior-Divergence/USPL/stairway search.mp4"></video>
        <p>Stairway Search (USPL)</p>
      </div>
    </div>
    <p>
      These results show that the behavioral discrepancy between the DP and PP in USPL is significantly lower than in RMA, with their trajectories almost entirely overlapping most of the time.
    </p>
  </section>

  <!-- DP-Behaviour and Uncertainty Visualization Section -->
  <section class="video-section">
    <h2>2. Behavior and Privileged Prediction Visualization</h2>
    <p>This section shows USPL robot trajectories alongside predicted privileged observations. Standard deviation indicates uncertainty. In Blind-Mass Stack and Biased Quadrotor, the semi-transparent red/blue/green blocks indicate the range of predicted privileged observations, and the dashed lines pointed to by the two triangles denote the actual privileged observation.</p>
    <div class="video-pair">
      <div class="video-item">
        <video controls src="DP-BehaviourAndUncertaintyVisualization/cube stack/green_blue_red.mp4"></video>
        <p>Blind-Mass Stack</p>
      </div>
      <div class="video-item">
        <video controls src="DP-BehaviourAndUncertaintyVisualization/lateral choice.mp4"></video>
        <p>Lateral Choice</p>
      </div>
    </div>
    <div class="video-pair">
      <div class="video-item">
        <video controls src="DP-BehaviourAndUncertaintyVisualization/midpoint choice (image).mp4"></video>
        <p>Midpoint Choice (image)</p>
      </div>
      <div class="video-item">
        <video controls src="DP-BehaviourAndUncertaintyVisualization/midpoint choice.mp4"></video>
        <p>Midpoint Choice</p>
      </div>
    </div>
    <div class="video-pair">
      <div class="video-item">
        <video controls src="DP-BehaviourAndUncertaintyVisualization/quadrotor.mp4"></video>
        <p>Biased Quadrotor</p>
      </div>
      <div class="video-item">
        <video controls src="DP-BehaviourAndUncertaintyVisualization/signpost nav （image）.mp4"></video>
        <p>Signpost Nav (image)</p>
      </div>
    </div>
    <div class="video-pair">
      <div class="video-item">
        <video controls src="DP-BehaviourAndUncertaintyVisualization/signpost nav.mp4"></video>
        <p>Signpost Nav</p>
      </div>
      <div class="video-item">
        <video controls src="DP-BehaviourAndUncertaintyVisualization/stairway search.mp4"></video>
        <p>Stairway Search</p>
      </div>
    </div>

    <div class="video-pair">
      <div class="video-item">
        <video controls src="DP-BehaviourAndUncertaintyVisualization/square maze/bottom_left.mp4"></video>
        <p>Square Maze (Goal at Bottom Left)</p>
      </div>
      <div class="video-item">
        <video controls src="DP-BehaviourAndUncertaintyVisualization/square maze/upper_left.mp4"></video>
        <p>Square Maze (Goal at Upper Left)</p>
      </div>
    </div>
    <p>
      These results indicate that the observation encoder can sensitively predict the current uncertainty; once some information is discovered, the uncertainty responds rapidly and accurately reflects how much information remains to be gathered.
    </p>
  </section>

  <!-- Manually Set Uncertainty Section -->
  <section class="video-section">
    <h2>3. Manually Set Uncertainty</h2>
    <p>Here, instead of feeding the uncertainty output by the observation encoder to the policy, we manually set the uncertainty. The uncertainty value is displayed in the top-left corner of the video. We start by assigning a high uncertainty, and after some time, reduce it to observe how the policy responds.</p>

    <div class="video-pair">
      <div class="video-item">
        <video controls src="ManuallySetUncertainty/cubestack.mp4"></video>
        <p>Blind-Mass Stack</p>
      </div>
      <div class="video-item">
        <video controls src="ManuallySetUncertainty/lateral choice.mp4"></video>
        <p>Lateral Choice</p>
      </div>
    </div>
    <div class="video-pair">
      <div class="video-item">
        <video controls src="ManuallySetUncertainty/midpoint choice (image).mp4"></video>
        <p>Midpoint Choice (image)</p>
      </div>
      <div class="video-item">
        <video controls src="ManuallySetUncertainty/midpoint choice.mp4"></video>
        <p>Midpoint Choice</p>
      </div>
    </div>
    <div class="video-pair">
      <div class="video-item">
        <video controls src="ManuallySetUncertainty/quadrotor.mp4"></video>
        <p>Biased Quadrotor</p>
      </div>
      <div class="video-item">
        <video controls src="ManuallySetUncertainty/signpost nav (image).mp4"></video>
        <p>Signpost Nav (image)</p>
      </div>
    </div>
    <div class="video-pair">
      <div class="video-item">
        <video controls src="ManuallySetUncertainty/signpost nav.mp4"></video>
        <p>Signpost Nav</p>
      </div>
      <div class="video-item">
        <video controls src="ManuallySetUncertainty/square maze.mp4"></video>
        <p>Square Maze</p>
      </div>
    </div>
    <div class="video-pair">
      <div class="video-item">
        <video controls src="ManuallySetUncertainty/stairway search.mp4"></video>
        <p>Stairway Search</p>
      </div>
    </div>
    <p>
      As observed, when the uncertainty is high, the policy remains in an exploratory mode and avoids completing the task. Once the uncertainty decreases, the policy immediately proceeds to accomplish the task, demonstrating that our privileged policy is highly sensitive to uncertainty.
    </p>
  </section>

  <!-- Tasks Descriptions Section -->
  <section class="task-desc">
    <h2>4. Tasks Descriptions</h2>

    <h3>Stairway Search</h3>
    <p><strong>Observation space:</strong> Depth images from an onboard depth camera; current robot position and orientation.</p>
    <p><strong>Action space:</strong> Desired change in heading (yaw) and desired body pitch.</p>
    <p><strong>Privileged observation:</strong> Coordinates of the target platform.</p>
    <p><strong>Reward:</strong> Shaped reward for approaching the target platform and a terminal reward for reaching it.</p>
    <p><strong>Task description:</strong> The robot starts on a large platform and must step onto a smaller platform with stairways on both sides to descend to the ground. A low-level controller receives target speed (0.5 m/s), desired pitch, and yaw commands.</p>
    <p><strong>Optimal behaviour:</strong> Peer over the edge to locate the smaller platform, then walk onto it.</p>

    <h3>Lateral Choice</h3>
    <p><strong>Observation space:</strong> Current robot position and orientation.</p>
    <p><strong>Action space:</strong> Desired change in heading (yaw).</p>
    <p><strong>Privileged observation:</strong> Coordinates of the goal point.</p>
    <p><strong>Reward:</strong> Shaped reward for approaching the goal and a terminal reward for reaching it.</p>
    <p><strong>Task description:</strong> The goal may be on left or right; the robot is blind to terrain and must navigate by trial. Episode ends when the goal is reached.</p>
    <p><strong>Optimal behaviour:</strong> Walk toward one side; if the goal is not reached, turn around and go to the opposite side.</p>

    <h3>Blind-Mass Stack</h3>
    <p><strong>Observation space:</strong> Positions of three cubes; end-effector position; index of green cube; grasp flag and index; measured mass of grasped cube.</p>
    <p><strong>Action space:</strong> Discrete: end-effector x,y target (one of three cubes), end-effector z target (three heights), gripper open/close.</p>
    <p><strong>Privileged observation:</strong> Index of red cube and bias on weight sensor.</p>
    <p><strong>Task description:</strong> Cubes red, green, blue placed left to right (1–3). Green’s mass known (1 kg); red is 0.75 kg but index unknown; blue unknown. Weight sensor biased by a factor. Actor must self-calibrate and stack the red cube on green.</p>
    <p><strong>Optimal behaviour:</strong> Grasp green to calibrate bias, then pick a remaining cube, check calibrated mass, and stack the red cube on green.</p>

    <h3>Signpost Nav</h3>
    <p><strong>Observation space:</strong> Position, orientation, plus either scandot heights or head-mounted depth images.</p>
    <p><strong>Action space:</strong> Desired change in heading (yaw).</p>
    <p><strong>Privileged observation:</strong> Coordinates of the goal point.</p>
    <p><strong>Reward:</strong> Shaped reward for moving toward goal and terminal reward for reaching it.</p>
    <p><strong>Task description:</strong> A signpost encodes hidden goal direction (orientation) and distance (length). Robot must infer goal and navigate.</p>
    <p><strong>Optimal behaviour:</strong> Reach signpost, infer goal geometry, then travel to goal.</p>

    <h3>Square Maze</h3>
    <p><strong>Observation space:</strong> Position and orientation.</p>
    <p><strong>Action space:</strong> Desired change in heading (yaw).</p>
    <p><strong>Privileged observation:</strong> Coordinates of the goal point.</p>
    <p><strong>Reward:</strong> Shaped reward for approaching goal and terminal reward for reaching it.</p>
    <p><strong>Task description:</strong> Goal at one of four corners; maze layout slightly varies. Probe junctions: if collision then right, else left.</p>
    <p><strong>Optimal behaviour:</strong> Probe each junction to infer layout and reach goal corner.</p>

    <h3>Midpoint Choice</h3>
    <p><strong>Observation space:</strong> Position, orientation, plus either scandot or depth images.</p>
    <p><strong>Action space:</strong> Desired change in heading (yaw).</p>
    <p><strong>Privileged observation:</strong> Coordinates of the goal point.</p>
    <p><strong>Reward:</strong> Shaped reward for approaching goal and terminal reward for reaching it.</p>
    <p><strong>Task description:</strong> Four corner platforms each preceded by two columns; only goal platform’s pair is second-tallest. Robot measures heights, identifies second tallest, then moves there.</p>
    <p><strong>Optimal behaviour:</strong> Circle platforms, record heights, identify second-tallest pair, go to corresponding platform.</p>

    <h3>Biased Quadrotor</h3>
    <p><strong>Observation space:</strong> Perceived altitude, roll, pitch, angular velocity, vertical velocity, and landed flag.</p>
    <p><strong>Action space:</strong> Target pitch, roll, and altitude.</p>
    <p><strong>Privileged observation:</strong> Biases in altitude, roll, and pitch sensors.</p>
    <p><strong>Reward:</strong> Shaped for maintaining target altitude/attitude, terminal for achieving them, penalty for touching ground.</p>
    <p><strong>Task description:</strong> Quadrotor must hover at target altitude with corrupted sensor readings. Ground contact provides reference to calibrate sensors, then hover.</p>
    <p><strong>Optimal behaviour:</strong> Land to zero-reference sensors, calibrate, then take off and maintain steady hover.</p>
  </section>

</body>
</html>
