<html>
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
        <title>MoDA: Multi-modal Diffusion Architecture for Talking Head Generation</title>
        <link rel="stylesheet" href="static/css/bulma.min.css">
        <link rel="stylesheet" href="static/css/bulma-carousel.min.css">
        <link rel="stylesheet" href="static/css/bulma-slider.min.css">
        <link rel="stylesheet" href="static/css/fontawesome.all.min.css">
        <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
        <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
        <link href="./asserts/style.css" rel="stylesheet">

        <script async src="//busuanzi.ibruce.info/busuanzi/2.3/busuanzi.pure.mini.js"></script>
    </head>

    <body>
        <div class="content">
            <h1><strong> MoDA: Multi-modal Diffusion Architecture for Talking Head Generation</strong></h1>
        </div>
        <div class="content">
            <h2 style="text-align:center"><strong>Abstract</strong></h2>

            <div id="teasers">
                <img src="asserts/frameworks.png", style="width: 100%;">
                <figcaption></figcaption>
            </div>

            <p style="line-height: 30px;">
Talking head generation with arbitrary identities and speech audio remains a crucial problem in the realm of the virtual metaverse. 
Despite progress, current methods still struggle to synthesize diverse facial expressions and natural head movements while generating synchronized lip movements with the audio.
The main challenge is stylistic discrepancies between speech audio, individual identity, and portrait dynamics. 
To address the challenge of inter-modal inconsistency, we introduce MoDA, a multi-modal diffusion architecture with two well-designed technologies. First, MoDA explicitly models the interaction among motion, audio, and auxiliary conditions, enhancing overall facial expressions and head dynamics. In addition, a coarse-to-fine fusion strategy is employed to progressively integrate different conditions, ensuring effective feature fusion. Experimental results demonstrate that MoDA improves video diversity, realism, and efficiency, making it suitable for real-world applications.
            </p>

        </div>

        <div class="content">
            <h2 style="text-align: center;"><strong>gallery</strong></h2>
        
            <h3>Qualitative Evaluation.</h3>
            <div class="gallery">
                <div class="row" style="display: flex; flex-wrap: nowrap;">
                    <!-- moda -->
                    <div style="width: 14.28%; padding: 0 5px; box-sizing: border-box;">
                        <p style="text-align: center;"><strong>moda</strong></p>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compara5/our.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare/moda.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare2/moda.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare3/moda.mp4" type="video/mp4">
                        </video>
                    </div>
                
                    <!-- echomimic -->
                    <div style="width: 14.28%; padding: 0 5px; box-sizing: border-box;">
                        <p style="text-align: center;"><strong>echomimic</strong></p>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compara5/ec.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare/ec.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare2/ech.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare3/ech.mp4" type="video/mp4">
                        </video>
                    </div>
                
                    <!-- hallo2 -->
                    <div style="width: 14.28%; padding: 0 5px; box-sizing: border-box;">
                        <p style="text-align: center;"><strong>hallo2</strong></p>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compara5/hallo2.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare/hallo2.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare2/hallo2.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare3/hallo2.mp4" type="video/mp4">
                        </video>
                    </div>
                
                    <!-- hallo -->
                    <div style="width: 14.28%; padding: 0 5px; box-sizing: border-box;">
                        <p style="text-align: center;"><strong>hallo</strong></p>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compara5/hallo.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare/hallo.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare2/hallo.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare3/hallo.mp4" type="video/mp4">
                        </video>
                    </div>
                
                    <!-- joyhallo -->
                    <div style="width: 14.28%; padding: 0 5px; box-sizing: border-box;">
                        <p style="text-align: center;"><strong>joyhallo</strong></p>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compara5/joyhallo.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare/joyhallo.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare2/joyhallo.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare3/joyhallo.mp4" type="video/mp4">
                        </video>
                    </div>
                
                    <!-- joyvasa -->
                    <div style="width: 14.28%; padding: 0 5px; box-sizing: border-box;">
                        <p style="text-align: center;"><strong>joyvasa</strong></p>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compara5/joyvasa.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare/joyvasa.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare2/joyvasa.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare3/joyvasa.mp4" type="video/mp4">
                        </video>
                    </div>
                    <!-- ditto-->
                    <div style="width: 14.28%; padding: 0 5px; box-sizing: border-box;">
                        <p style="text-align: center;"><strong>ditto</strong></p>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compara5/ditto.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare/ditto.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare2/ditto.mp4" type="video/mp4">
                        </video>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/compare3/ditto.mp4" type="video/mp4">
                        </video>
                    </div>
                </div>

            <h3>Talking Head Generation in Complex Scenarios.</h3>
            <div class="gallery">
                <div class="row">
                    <video style="width: 25%; object-fit: cover;" controls>
                        <source src="moda/Complex Scenarios/2.mp4" type="video/mp4">
                    </video>
                    <video style="width: 25%; object-fit: cover;" controls>
                        <source src="moda/Complex Scenarios/3.mp4" type="video/mp4">
                    </video>
                    <video style="width: 25%; object-fit: cover;" controls>
                        <source src="moda/Complex Scenarios/4.mp4" type="video/mp4">
                    </video>
                    <video style="width: 25%; object-fit: cover;" controls>
                        <source src="moda/Complex Scenarios/5.mp4" type="video/mp4">
                    </video>
                </div>
                <div class="row">
                    <video style="width: 25%; object-fit: cover;" controls>
                        <source src="moda/Complex Scenarios/6.mp4" type="video/mp4">
                    </video>
                    <video style="width: 25%; object-fit: cover;" controls>
                        <source src="moda/Complex Scenarios/7.mp4" type="video/mp4">
                    </video>
                    <video style="width: 25%; object-fit: cover;" controls>
                        <source src="moda/Complex Scenarios/8.mp4" type="video/mp4">
                    </video>
                    <video style="width: 25%; object-fit: cover;" controls>
                        <source src="moda/Complex Scenarios/9.mp4" type="video/mp4">
                    </video>
                </div>
            </div>
            <h3>Fine-grained Emotion Control.</h3>
            <div class="gallery">
                <div class="row">
                    <div style="width: 50%; padding: 0 5px; box-sizing: border-box;">
                        <p style="text-align: center;"><strong>Happy</strong></p>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/Emotion Control/happy.mp4" type="video/mp4">
                        </video>
                    </div>
                    <div style="width: 50%; padding: 0 5px; box-sizing: border-box;">
                        <p style="text-align: center;"><strong>Sad</strong></p>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/Emotion Control/sad.mp4" type="video/mp4">
                        </video>
                    </div>
                </div>
            </div>
            <h3>Long Videos Generation.</h3>
            <div class="gallery">
                <div class="row">
                    <video style="width: 50%; object-fit: cover;" controls>
                    <source src="moda/Long Videos Generation/1.mp4" type="video/mp4">
                    </video>
                </div>
            </div>
            <h3>Ablation Study.</h3>
            <div class="gallery">
                <div class="row" style="display: flex;">
                    <div style="width: 25%; padding: 0 5px; box-sizing: border-box;">
                        <p style="text-align: center;"><strong>moda</strong></p>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/Ablation Study/moda.mp4" type="video/mp4">
                        </video>
                    </div>
                    <div style="width: 25%; padding: 0 5px; box-sizing: border-box;">
                        <p style="text-align: center;"><strong>w CABA</strong></p>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/Ablation Study/wcaba.mp4" type="video/mp4">
                        </video>
                    </div>
                    <div style="width: 25%; padding: 0 5px; box-sizing: border-box;">
                        <p style="text-align: center;"><strong>replace audio</strong></p>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/Ablation Study/audio.mp4" type="video/mp4">
                        </video>
                    </div>
                    <div style="width: 25%; padding: 0 5px; box-sizing: border-box;">
                        <p style="text-align: center;"><strong>replace image</strong></p>
                        <video style="width: 100%; object-fit: cover;" controls>
                            <source src="moda/Ablation Study/image.mp4" type="video/mp4">
                        </video>
                    </div>
                 </div>
            </div>
        </div>

        <footer style="text-align: center; font-size: medium; color: blueviolet;">
            <span id="busuanzi_container_page_pv">Page Views: <span id="busuanzi_value_page_pv"></span></span>
        </footer>

    </body>

</html>