<!DOCTYPE html>
<html data-wf-domain="" data-wf-page="596e65d120426e09785027f0" data-wf-site="596e65d120426e09785027eb"
    data-wf-status="1"
    class="w-mod-js wf-opensans-n3-active wf-opensans-n4-active wf-roboto-n4-active wf-opensans-i3-active wf-opensans-i4-active wf-opensans-n6-active wf-opensans-i6-active wf-opensans-n7-active wf-opensans-i7-active wf-opensans-n8-active wf-opensans-i8-active wf-roboto-n3-active wf-roboto-n5-active wf-active">

<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

    <title>CODA</title>
    <meta content="width=device-width, initial-scale=1" name="viewport">
    <meta content="Webflow" name="generator">
    <link href="./files/supplemental.css" rel="stylesheet" type="text/css">
    <script src="./files/webfont.js" type="text/javascript"></script>
    <script type="text/javascript">
        WebFont.load({
            google: {
                families: ["Open Sans:300,300italic,400,400italic,600,600italic,700,700italic,800,800italic", "Roboto:300,regular,500"]
            }
        });
    </script>
    <script type="text/javascript">
        ! function (o, c) {
            var n = c.documentElement,
                t = " w-mod-";
            n.className += t + "js", ("ontouchstart" in o || o.DocumentTouch && c instanceof DocumentTouch) && (n.className += t + "touch")
        }(window, document);
    </script>
</head>


<body class="body">
    <div class="section">
        <div class="container-3 w-container">
            <h1 class="papertitle">CODA: Commonsense-Driven Autoregressive Human Interaction Generation</h1>
            <div class="text-block">ICLR 2026 - Submission No: 1955 </div>
        </div>
    </div>

    <div class="section-2">
        <div class="container w-container">
            <ul role="list" class="list">
                <li>
                    <a href="#experiment_a">A. Performance shows of CODA </a>
                </li>
                <li>
                    <a href="#experiment_b">B. Compare with the State-of-the-Art </a>
                </li>
                <li>
                    <a href="#experiment_c">C. Ablation Results </a>
                </li>
                <li>
                    <a href="#experiment_d">D. More Our Results </a>
                </li>

            </ul>
        </div>
    </div>

<div>
  <div class="container-2 w-container">
    <div class="container-2 w-container">
      <h3 id="experiment_a" class="experimenttitle">A. Performance shows of CODA (Figure 1) </h3>
    </div>

    <!-- <p class="paragraph">
      Here, we present side-by-side comparisons of the joint-level keypoints generated by our model and their
      conversion to SMPL. It can be seen our model outputs smooth and fluid motions, and the observed sudden
      movements and lack of fluidity arise during the SMPL conversion, as the utilized conversion code
      processes each frame independently.
    </p> -->
    <!-- Text under first row -->
    <div class="w-row">
      <div class="w-col w-col-6">
        <p style="text-align:center; margin-top: 10px;">Two people <strong><span style="color:red;">bow </span></strong> to each other.</p>
      </div>
      <div class="w-col w-col-6">
        <p style="text-align:center; margin-top: 10px;">The two people <strong><span style="color:red;">hug </span></strong> each other tightly.
</p>
      </div>
    </div>
    <!-- First row of videos -->
    <div class="videoresult w-row">
            <p style="text-align:center; margin-top: 10px;"><strong>InterMask</strong></p>

            <div class="w-col w-col w-col-6 video-center">
                <!-- <br> -->

                <!-- <br> -->
                <video width="80%" height="100%" source="" src="./demo_videos\Performance_results\InterMask_bow_.mp4" type="video/mp4"
                    loop="true" autoplay="autoplay" controls muted></video>
            </div>
            <div class="w-col w-col w-col-6 video-center">
                <!-- <br> -->

                <!-- <br> -->
                <video width="80%" height="100%" source="" src="./demo_videos\Performance_results\InterMask_hug_.mp4" type="video/mp4"
                    loop="true" autoplay="autoplay" controls muted></video>
            </div>
                <div class="w-col w-col w-col-6 video-center">
                    <!-- <br> -->

                    <!-- <br> -->
                    <video width="80%" height="100%" source="" src="./demo_videos\Performance_results\InterMask_bow.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>

                <div class="w-col w-col w-col-6 video-center">
                    <!-- <br> -->

                    <!-- <br> -->
                    <video width="80%" height="100%" source="" src="./demo_videos\Performance_results\InterMask_hug.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
            </div>
    <p style="text-align:center; margin-top: 10px;"><strong>CODA(Ours)</strong></p>

                 <div class="videoresult w-row">
                    <div class="w-col w-col w-col-6 video-center">
                        <!-- <br> -->
                        <!-- <br> -->
                        <video width="80%" height="100%" source="" src="./demo_videos\Performance_results\our_bow_.mp4" type="video/mp4"
                            loop="true" autoplay="autoplay" controls muted></video>
                    </div>
                    <div class="w-col w-col w-col-6 video-center">
                        <!-- <br> -->
                        <!-- <br> -->
                        <video width="80%" height="100%" source="" src="./demo_videos\Performance_results\our_hug_.mp4" type="video/mp4"
                            loop="true" autoplay="autoplay" controls muted></video>
                    </div>
                <div class="w-col w-col w-col-6 video-center">
                    <!-- <br> -->
                    <!-- <br> -->
                    <video width="80%" height="100%" source="" src="./demo_videos\Performance_results\our_bow.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
                <div class="w-col w-col w-col-6 video-center">
                    <!-- <br> -->
                    <!-- <br> -->
                    <video width="80%" height="100%" source="" src="./demo_videos\Performance_results\our_hug.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
            </div>


    <hr>
  </div>
</div>


    <div>
        <div class="container-2 w-container">
            <div class="container-2 w-container">
                <h3 id="experiment_d" class="experimenttitle">B. Compare with the State-of-the-Art (Figure 4) </h3>
            </div>
            <!-- <p class="paragraph">
                Here, we provide a qualitative comparison of interaction sequences generated by our CODA and InterMask [1], trained on the InterHuman dataset, for the same text descriptions. For the prompt "The two people hug each other tightly", InterMask suffers from issues of unintentional separation and penetration, while CODA ensures contact is maintained while avoiding penetration. For the prompt "One person sneaks up on the other from behind", InterMask generates a leaning-back issue for the first person, whereas CODA suppresses this unrealistic posture. Lastly, for the prompt "Two people are boxing. One is continuously punching while the other is defending and counterattacking", InterMask encounters a skeletal stretching issue, while CODA generates a more reasonable skeletal distribution. These examples demonstrate that, compared to InterMask, CODA produces more realistic, higher-quality, and physically plausible interactions.
            </p> -->
                <div class="w-row">
      <div class="w-col w-col-4">
        <p style="text-align:center; margin-top: 10px;">The two people <strong><span style="color:#c05252;">hug</span></strong> each other tightly.</p>
      </div>
      <div class="w-col w-col-4">
        <p style="text-align:center; margin-top: 10px;">One person <strong><span style="color:#c05252;">sneaks up</span></strong> on the other from behind.
</p>
      </div>
            <div class="w-col w-col-4">
        <p style="text-align:center; margin-top: 10px;">Two people are <strong><span style="color:#c05252;">boxing</span></strong>. One is continuously 
  <strong><span style="color:#c05252;">punching</span></strong> while the other is 
  <strong><span style="color:#c05252;">defending and counterattacking</span></strong>.
</p>
      </div>
    </div>
    <p style="text-align:center; margin-top: 10px;"><strong>InterMask</strong></p>
            <div class="videoresult w-row">
                <div class="w-col w-col w-col-4">

                    <video width="100%" height="100%" source="" src="./demo_videos\Compare_results\InterMask_hug.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
                <div class="w-col w-col w-col-4">

                    <video width="100%" height="100%" source="" src="./demo_videos\Compare_results\Intermask_sneak.mp4"
                        type="video/mp4" loop="true" autoplay="autoplay" controls muted></video>
                </div>
                <div class="w-col w-col w-col-4">
                    <video width="100%" height="100%" source="" src="./demo_videos\Compare_results\InterMask_boxing.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
            </div>
<p style="text-align:center; margin-top: 10px;"><strong>CODA (ours)</strong></p>

            <div class="videoresult w-row">
                <div class="w-col w-col w-col-4">
                    <!-- <br> -->

                    <video width="100%" height="100%" source="" src="./demo_videos\Compare_results\our_hug.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
                <div class="w-col w-col w-col-4">
                    <!-- <br> -->

                    <!-- <br> -->
                    <video width="100%" height="100%" source="" src="./demo_videos\Compare_results\our_sneak.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
                <div class="w-col w-col w-col-4">
                    <!-- <br> -->

                    <!-- <br> -->
                    <video width="100%" height="100%" source="" src="./demo_videos\Compare_results\Our_boxing.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
            </div>



            <hr>
        </div>
    </div>

    <div>
        <div class="container-2 w-container">
            <div class="container-2 w-container">
                <h3 id="experiment_b" class="experimenttitle">C. Ablation Results (Figure 6) </h3>
            </div>
            <!-- <p class="paragraph">
                To evaluate the impact of codesize on the performance of CODA, we conducted ablation studies under
                different codesize settings. The results show that when the codesize is set to 1024, the model achieves
                significant improvements across various evaluation metrics, including R TOP, FID, MMDist, and MModality.
                This performance gain is primarily attributed to the larger codesize, which enables more comprehensive
                encoding and storage of behavioral information, thereby significantly enhancing the quality and physical
                plausibility of the generated motions.
            </p>
            <div class="three-line-container">
                <span class="three-line-text">Ablation results on different CodeSize values on InterHuman test
                    dataset.</span>
            </div>
            <div class="image-result w-row">
                <div class="w-col w-col-6">
                    <img src="./demo_videos/Codesize_result/codesize.png" alt="codesize" width="100%" height="100%">
                </div>
                <div class="w-col w-col-6">
                    <img src="./demo_videos/Codesize_result/codesize_curve.png" alt="codesize_curve" width="40%"
                        height="25%">
                </div>
            </div> -->
               <!-- <p class="paragraph">
                Furthermore, in Fig.fig5, generation methods without loss constraints exhibit issues such as non-contact and penetration. In contrast, motions generated with the Human-Human loss constraints successfully capture interactive motions such as “handshake” and “hug,” further validating the physical plausibility of our CODA.
            </p>          -->

    <div class="w-row">
      <div class="w-col w-col-6">
        <p style="text-align:center; margin-top: 10px;"><b>The first person <span style="color:red;">shakes</span> hands with the second to say hello.</b></p>
      </div>
      <div class="w-col w-col-6">
        <p style="text-align:center; margin-top: 10px;"><b>Two persons <span style="color:red;">walk</span> forward while <span style="color:red;">hugging</span> each other.</b>
</p>

      </div>
           
    </div>
    <p style="text-align:center; margin-top: 10px;"><strong>InterMask</strong></p>

     <div class="videoresult w-row">
                <div class="w-col w-col w-col-6 video-center">
                    <!-- <br> -->

                    <!-- <br> -->
                    <video width="80%" height="100%" source="" src="./demo_videos\Ablation_results\baseline_shake.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
                <div class="w-col w-col w-col-6 video-center">
                    <!-- <br> -->

                    <!-- <br> -->
                    <video width="80%" height="100%" source="" src="./demo_videos\Ablation_results\baseline_hug.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
            </div>
    <p style="text-align:center; margin-top: 10px;"><strong>VQ-VAE</strong></p>

                 <div class="videoresult w-row">
                <div class="w-col w-col w-col-6 video-center">
                    <!-- <br> -->
                    <div class="three-line-container">
                        <center> W/ L<sub>COM</sub>
                        </center> </span>
                    </div>
                    <!-- <br> -->
                    <video width="80%" height="100%" source="" src="./demo_videos\Ablation_results\COM_shake.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
                <div class="w-col w-col w-col-6 video-center">
                    <!-- <br> -->
                    <div class="three-line-container">
                        <center> W/ L<sub>COM</sub> & L<sub>KTraj
                        </center> </span>
                    </div>
                    <!-- <br> -->
                    <video width="80%" height="100%" source="" src="./demo_videos\Ablation_results\CK_shake.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
            </div>
    <p style="text-align:center; margin-top: 10px;"><strong>CVQ-VAE</strong></p>

                 <div class="videoresult w-row">
                <div class="w-col w-col w-col-6 video-center">
                    <!-- <br> -->
                    <div class="three-line-container">
                        <center> W/ L<sub>DM</sub>
                        </center> </span>
                    </div>
                    <!-- <br> -->
                    <video width="80%" height="100%" source="" src="./demo_videos\Ablation_results\CD_hug.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
                <div class="w-col w-col w-col-6 video-center">
                    <!-- <br> -->
                    <div class="three-line-container">
                        <center> W/ L<sub>GDM 
                        </center> </span>
                    </div>
                    <!-- <br> -->
                    <video width="80%" height="100%" source="" src="./demo_videos\Ablation_results\CG_hug.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
            </div>
    <p style="text-align:center; margin-top: 10px;">W/ L<sub>COM</sub>& L<sub>KTraj</sub>&L<sub>GDM</p>

                             <div class="videoresult w-row">
                <div class="w-col w-col w-col-6 video-center">
                    <!-- <br> -->
                    <!-- <div class="three-line-container">
                        <center> W/ L<sub>DM</sub>
                        </center> </span>
                    </div> -->
                    <!-- <br> -->
                    <video width="80%" height="100%" source="" src="./demo_videos\Ablation_results\CKG_shake.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
                <div class="w-col w-col w-col-6 video-center">
                    <!-- <br> -->
                    <!-- <div class="three-line-container">
                        <center> W/ L<sub>GDM 
                        </center> </span>
                    </div> -->
                    <!-- <br> -->
                    <video width="80%" height="100%" source="" src="./demo_videos\Ablation_results\CKG_hug.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
            </div>
        <p style="text-align:center; margin-top: 10px;"><strong>CODA (ours)</strong></p>

                             <div class="videoresult w-row">
                <div class="w-col w-col w-col-6 video-center">
                    <!-- <br> -->
                    <!-- <div class="three-line-container">
                        <center> W/ L<sub>DM</sub>
                        </center> </span>
                    </div> -->
                    <!-- <br> -->
                    <video width="80%" height="100%" source="" src="./demo_videos\Ablation_results\our_shake.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
                <div class="w-col w-col w-col-6 video-center">
                    <!-- <br> -->
                    <!-- <div class="three-line-container">
                        <center> W/ L<sub>GDM 
                        </center> </span>
                    </div> -->
                    <!-- <br> -->
                    <video width="80%" height="100%" source="" src="./demo_videos\Ablation_results\our_hug.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
            </div>



            <hr>
        </div>
    </div>



    <div>
        <div class="container-2 w-container">
            <div class="container-2 w-container">
                <h3 id="experiment_d" class="experimenttitle">D. More Our Results (Figure 9) </h3>
            </div>
            <!-- <p class="paragraph">
                Here, we provide more qualitative results of our CODA. It can be observed that CODA is capable
                of generating physically plausible motions (e.g., walking, hugging, and attacking), while maintaining a
                high level of consistency between text and motion (e.g., waving, sitting, and blaming). These results
                further validate the original intention and effectiveness of the proposed method in enhancing the
                plausibility of motion generation. 
            </p> -->

            <div class="videoresult w-row">
                <div class="w-col w-col w-col-4">
                    <br>
                    <div class="three-line-container">
                        <center>One person <strong><span style="color:red;">approaches</span></strong> the other.
                        </center>
                        <!-- <br>
                         <br> -->
                        </span>
                    </div>
                    <!-- <br> -->
 <!-- <video width="100%" height="100%" source="" src="./demo_videos\More_results\approch.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>                     -->
                        <video width="100%" height="100%" source="" src="./demo_videos\More_results\approch_blender.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
                <div class="w-col w-col w-col-4">
                    <br>
                    <div class="three-line-container">
                        <center>Two people are <strong><span style="color:red;">waving</span></strong> their hands and
                            performing a dance step together.</center></span>
                    </div>
                    <!-- <br> -->
                    <!-- <video width="100%" height="100%" source="" src="./demo_videos\More_results\waving.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video> -->
                    <video width="100%" height="100%" source="" src="./demo_videos\More_results\waving_blender.mp4"
                        type="video/mp4" loop="true" autoplay="autoplay" controls muted></video>
                </div>
                <div class="w-col w-col w-col-4">
                    <!-- <br> --><br>
                    <div class="three-line-container">
                        <center> First person is <strong><span style="color:red;">sitting</span></strong> in a chair,
                            the second <strong><span style="color:red;">takes</span></strong> a step forward with their
                            right foot.
                        </center> </span>
                    </div>
                    <!-- <br> -->
                    <!-- <video width="100%" height="100%" source="" src="./demo_videos\More_results\sitting.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video> -->
                    <video width="100%" height="100%" source="" src="./demo_videos\More_results\sitting_blender.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
            </div>

            <div class="videoresult w-row">
                <div class="w-col w-col w-col-4">
                    <!-- <br> -->
                    <div class="three-line-container">
                        <center> The two are <strong><span style="color:red;">blaming</span></strong> each other and
                            having an intense argument.
                        </center> </span>
                    </div>
                    <br>
                    <!-- <video width="100%" height="100%" source="" src="./demo_videos\More_results\blaming.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video> -->
                    <video width="100%" height="100%" source="" src="./demo_videos\More_results\blaming_blender.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
                <div class="w-col w-col w-col-4">
                    <!-- <br> -->
                    <div class="three-line-container">
                        <center> Two persons <strong><span style="color:red;">walk</span></strong> forward while
                            <strong><span style="color:red;">hugging</span></strong> each other.

                        </center> </span>
                    </div>
                    <br>
                    <!-- <video width="100%" height="100%" source="" src="./demo_videos\More_results\walk.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>                    -->
                         <video width="100%" height="100%" source="" src="./demo_videos\Ablation_results\our_hug.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
                <div class="w-col w-col w-col-4">
                    <!-- <br> -->
                    <div class="three-line-container">
                        <center> Both people are doing <strong><span style="color:red;">fencing</span></strong>
                            practice, <strong><span style="color:red;">attacking</span></strong> each other with their
                            swords. During the practice, the first person make a short lunge and touches the tip of the
                            sword to the top of the second's head.
                        </center> </span>
                    </div>
                    <!-- <br> --><br>
                                <!-- <video width="100%" height="100%" source="" src="./demo_videos\More_results\fencing.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video> -->
                    <video width="100%" height="100%" source="" src="./demo_videos\More_results\fencing_blender.mp4" type="video/mp4"
                        loop="true" autoplay="autoplay" controls muted></video>
                </div>
            </div>



            <hr>
        </div>
    </div>

    <script src="./files/jquery-3.4.1.min.220afd743d.js" type="text/javascript"
        integrity="sha256-CSXorXvZcTkaix6Yvo6HppcZGetbYMGWSFlBw8HfCJo=" crossorigin="anonymous"></script>
    <script src="./files/webflow.3cd0ca831.js" type="text/javascript"></script>
</body>