<section class="project-info scroll-section" id="summary-section">
    <div class="project-info-inner">
        <h2 class="section-header">
            Observational Learning for Robots
            <span class="section-subtitle">
                Animals and humans often learn by observing tasks being performed by others.
                We can then learn to perform the task ourselves according to what we saw. 
                If robots could learn this way, they would no longer require expert supervision from hand-designed
                <span class="term term--bold" data-tooltip="Mathematical reward definitions can be challenging to design.

                For example, specifying pushing a block to a location with only camera images may require complex computer vision algorithms to first identify then reward.">rewards</span>
                or
                <span class="term term--bold" data-tooltip="Remotely controlling a robot via human operator. This can be challenging for complex robots (like humanoids or tactile hands) or for non-expert users!">tele-operation</span>.
            </span>
        </h2>
        <p>
        </p>
        <br>
        <!-- </div> -->
        <!-- <div class="highlight-box">
            We introduce <strong>MPAIL2</strong> for real-world observational learning on robots
        </div> -->
        <div class="teaser-video-wrapper">
            <div class="video-display teaser-video" style="overflow: hidden;">
                <video autoplay muted loop playsinline style="width: 100%; height: 100%; object-fit: cover;">
                    <source src="Media/Video/teaser.mp4" type="video/mp4">
                </video>
            </div>
        </div>
        <br>
        <!-- <h2>Model Predictive Adversarial Imitation Learning 2 (MPAIL2)</h2> -->
        <h2>Key Results</h2>
        <div class="highlight-box-group">
            <div class="highlight-box highlight-box--emerald">
                <strong>A First in Real-World Observational Learning</strong>
                <br>
                This work demonstrates the first work in observational learning 
                (<span class="term term--bold" data-tooltip-html="Inverse Reinforcement Learning from Observation. IRLfO is a type of inverse reinforcement learning that learns a reward function from observations of a task being performed by an expert.
                <br>
                <br>
                <i>While other observational learning frameworks exist, they do not readily admit improvement with experience.</i>
                ">IRLfO</span>)
                <span class="term term--bold" data-tooltip-html="Achieved without offline pre-training, prior data, models, or simulations. All learning is done in the real-world with model weights initialized randomly.
                <br>
                <br>
                While it is possible to pre-train MPAIL2, this work aims to remain as general as possible. For instance, non-manipulation settings often do not have readily available pre-trained policy models.
                "
                >purely in the real-world from scratch</span>.
            </div>
            <div class="highlight-box highlight-box--emerald">
                <strong>A First in Transfer Learning</strong>
                <br>
                This work is the first to demonstrate
                <span class="term term--bold" data-tooltip-html="Currently one of the primary differences between man and machine, Transfer learning is the process of transferring knowledge from one task to another.
                </i>
                ">transfer learning</span> from scratch in the real-world.
            </div>
            <div class="highlight-box highlight-box--orange">
                <strong>Real-World Sample Efficiency</strong>
                <br>
                Where a state-of-the-art baseline in RL with demonstrations (<span class="term term--bold" data-tooltip-html="<a href='https://proceedings.mlr.press/v202/ball23a/ball23a.pdf' target='_blank'>Reinforcement Learning with Prior Data (RLPD)</a>, Ball et al. 2023.
                ">RLPD</span>) sees no success after over an hour of real-world training, MPAIL2 sees consistent success in under 40 minutes.
            </div>
        </div>
    </div>
</section>