<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>LightningSpeech</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            margin: 20px;
            padding: 0;
            background-color: #f4f4f4;
        }
        .container {
            max-width: 1200px;
            margin: auto;
            background: white;
            padding: 20px;
            border-radius: 8px;
            box-shadow: 0 0 10px rgba(0,0,0,0.1);
            overflow-x: auto; /* Enable horizontal scrolling */
        }
        table {
            width: 100%;
            border-collapse: collapse;
            margin-bottom: 20px; /* Space between tables */
        }
        th, td {
            text-align: left;
            padding: 8px;
            border: 1px solid #ddd;
        }
        th {
            background-color: #f2f2f2;
        }
        audio {
            width: 100%;
            min-width: 150px; /* Minimum width for audio player */
        }
    </style>
</head>
<body>

<div class="container">
    <h1>EDM-TTS: Efficient Dual-Stage Masked Modeling for Alignment-Free Text-to-Speech Synthesis</h1>
    <h2>Zero-Shot TTS examples sampled randomly from the evaluation data of Table 3.</h2>

    <p>Text input: If you thought I lived in New York, why in the world didn't you come and see me ? the lady inquired.</p>
    <table>
        <tr>
            <th>Speaker Prompt</th>
            <th>Ours</th>
            <th>HierSpeech++</th>
            <th>WhisperSpeech</th>
            <th>XTTSv2</th>
            <th>StyleTTS2</th>
        </tr>
        <tr>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence1/prompt.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence1/Ours.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence1/HierSpeech++.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence1/WhisperSpeech.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence1/XTTSv2.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence1/StyleTTS2.wav" type="audio/wav"></audio></td>
        </tr>
    </table>

    <p>Text input: However loudly outward circumstances might oppose this, he now felt, with a certainty which surprised him, that this work was not his own.</p>
    <table>
        <tr>
            <th>Speaker Prompt</th>
            <th>Ours</th>
            <th>HierSpeech++</th>
            <th>WhisperSpeech</th>
            <th>XTTSv2</th>
            <th>StyleTTS2</th>
        </tr>
        <tr>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence2/prompt.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence2/Ours.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence2/HierSpeech++.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence2/WhisperSpeech.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence2/XTTSv2.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence2/StyleTTS2.wav" type="audio/wav"></audio></td>
        </tr>
    </table>

    <p>Text input: The railroads had not reached Jackson county, and wild game was plentiful on my father's farm on Big Creek near Lee's Summit.</p>
    <table>
        <tr>
            <th>Speaker Prompt</th>
            <th>Ours</th>
            <th>HierSpeech++</th>
            <th>WhisperSpeech</th>
            <th>XTTSv2</th>
            <th>StyleTTS2</th>
        </tr>
        <tr>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence3/prompt.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence3/Ours.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence3/HierSpeech++.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence3/WhisperSpeech.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence3/XTTSv2.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence3/StyleTTS2.wav" type="audio/wav"></audio></td>
        </tr>
    </table>

    <p>Text input: Then he reappeared, creeping along the earth, from which his dress was hardly distinguishable, directly in the rear of his intended captive.</p>
    <table>
        <tr>
            <th>Speaker Prompt</th>
            <th>Ours</th>
            <th>HierSpeech++</th>
            <th>WhisperSpeech</th>
            <th>XTTSv2</th>
            <th>StyleTTS2</th>
        </tr>
        <tr>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence4/prompt.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence4/Ours.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence4/HierSpeech++.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence4/WhisperSpeech.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence4/XTTSv2.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Zero-Shot%20TTS/Sentence4/StyleTTS2.wav" type="audio/wav"></audio></td>
        </tr>
    </table>

    <h2>Text input: Voice Conversion examples sampled randomly from the evaluation data of Table 2.</h2>

    <table>
        <tr>
            <th>Source Utterance</th>
            <th>Target Speaker</th>
            <th>Ours</th>
            <th>HierSpeech++</th>
            <th>DiffHierVC</th>
            <th>SoundStorm</th>
        </tr>
        <tr>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair1/source.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair1/target.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair1/Ours.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair1/HierSpeech++.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair1/DiffHierVC.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair1/SoundStorm.wav" type="audio/wav"></audio></td>
        </tr>
        <tr>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair2/source.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair2/target.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair2/Ours.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair2/HierSpeech++.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair2/DiffHierVC.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair2/SoundStorm.wav" type="audio/wav"></audio></td>
        </tr>
        <tr>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair3/source.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair3/target.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair3/Ours.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair3/HierSpeech++.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair3/DiffHierVC.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair3/SoundStorm.wav" type="audio/wav"></audio></td>
        </tr>
        <tr>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair4/source.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair4/target.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair4/Ours.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair4/HierSpeech++.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair4/DiffHierVC.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Voice%20Conversion/Pair4/SoundStorm.wav" type="audio/wav"></audio></td>
        </tr>
    </table>

    <h2>Ablation Study (Reconstruction)</h2>
    <h3>Injection Conformer, examples randomly sampled from evaluation data of Table 5.</h3>
    <table>
        <tr>
            <th>Speaker Prompt</th>
            <th>Ground Truth</th>
            <th>no-inj</th>
            <th>inj1</th>
            <th>inj2</th>
            <th>inj3</th>
            <th>noskip</th>
            <th>Ours (Iters=8)</th>
            <th>Iters=4</th>
            <th>Iters=2</th>
            <th>Iters=1</th>
        </tr>
        <tr>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File1/prompt.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File1/reference.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File1/No%20Injection.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File1/Injection%201.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File1/Injection%202.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File1/Injection%203.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File1/No%20Skip%20Connection.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File1/Ours%20(Iters=8).wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File1/Iters%3D4.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File1/Iters%3D2.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File1/Iters%3D1.wav" type="audio/wav"></audio></td>
        </tr>
        <tr>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File2/prompt.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File2/reference.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File2/No%20Injection.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File2/Injection%201.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File2/Injection%202.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File2/Injection%203.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File2/No%20Skip%20Connection.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File2/Ours%20(Iters=8).wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File2/Iters%3D4.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File2/Iters%3D2.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File2/Iters%3D1.wav" type="audio/wav"></audio></td>
        </tr>
        <tr>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File3/prompt.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File3/reference.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File3/No%20Injection.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File3/Injection%201.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File3/Injection%202.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File3/Injection%203.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File3/No%20Skip%20Connection.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File3/Ours%20(Iters=8).wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File3/Iters%3D4.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File3/Iters%3D2.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Reconstruction)/File3/Iters%3D1.wav" type="audio/wav"></audio></td>
        </tr>
    </table>

    <h2>Ablation Study (Resynthesis)</h2>
    <h3>Text-to-Semantic, examples randomly sampled from evaluation data of Table 6.</h3>
    <table>
        <tr>
            <th>Speaker Prompt</th>
            <th>Ground Truth</th>
            <th>GT Length</th>
            <th>0.7x GTLength</th>
            <th>1.3x GT Length</th>
            <th>Ours (Iters=16, Pred Length)</th>
            <th>Itrs=8</th>
            <th>Iters=4</th>
        </tr>
        <tr>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File1/prompt.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File1/reference.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File1/Iters=16,%20GT%20Len.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File1/Iters=16,%20GT%20Len%20(0.7).wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File1/Iters=16,%20GT%20Len%20(1.3).wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File1/Ours%20(Iters=16).wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File1/Iters=8.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File1/Iters=4.wav" type="audio/wav"></audio></td>
        </tr>
        <tr>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File2/prompt.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File2/reference.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File2/Iters=16,%20GT%20Len.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File2/Iters=16,%20GT%20Len%20(0.7).wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File2/Iters=16,%20GT%20Len%20(1.3).wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File2/Ours%20(Iters=16).wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File2/Iters=8.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File2/Iters=4.wav" type="audio/wav"></audio></td>
        </tr>
        <tr>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File3/prompt.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File3/reference.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File3/Iters=16,%20GT%20Len.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File3/Iters=16,%20GT%20Len%20(0.7).wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File3/Iters=16,%20GT%20Len%20(1.3).wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File3/Ours%20(Iters=16).wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File3/Iters=8.wav" type="audio/wav"></audio></td>
            <td><audio controls><source src="samples/Ablation%20(Resynthesis)/File3/Iters=4.wav" type="audio/wav"></audio></td>
        </tr>

    </table>
</div>

</body>
</html>
