<!DOCTYPE html>
<html lang="en-us">

<head>
	<meta charset="utf-8">
	<meta name="generator" content="Hugo 0.88.1" />
	<meta name="viewport" content="width=device-width, initial-scale=1">
	<link href="https://fonts.googleapis.com/css?family=Roboto:300,400,700" rel="stylesheet" type="text/css">
	<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/styles/github.min.css">
	<link rel="stylesheet" href="css/custom.css">
	<link rel="stylesheet" href="css/normalize.css">

	<title>VALL-E 2</title>
	<link href="css/bootstrap.min.css" rel="stylesheet">

</head>

<body data-new-gr-c-s-check-loaded="14.1091.0" data-gr-ext-installed="">

<div class="container" >
<header role="banner">
</header>
<main role="main">
<article itemscope itemtype="https://schema.org/BlogPosting">

<div class="container pt-5 mt-5 shadow p-5 mb-5 bg-white rounded">
	<div class="text-center">
	<h1>VALL-E 2</h1>
        <h3>Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers</h3>
	</div>
	<p>
        <b>Abstract.</b> 
		This paper introduces VALL-E 2, the latest advancement in neural codec language models that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity for the first time. Based on its predecessor, VALL-E, this work introduces two significant enhancements: Repetition Aware Sampling refines the original nucleus sampling process by accounting for token repetition in the decoding history. It not only stabilizes the decoding but also circumvents the infinite loop issue. Grouped Code Modeling organizes codec codes into groups to effectively shorten the sequence length, which not only boosts inference speed but also addresses the challenges of long sequence modeling. Our experiments on the LibriSpeech and VCTK datasets show that VALL-E 2 surpasses previous systems in speech robustness, naturalness, and speaker similarity. It is the first of its kind to reach human parity on these benchmarks. Moreover, VALL-E 2 consistently synthesizes high-quality speech, even for sentences that are traditionally challenging due to their complexity or repetitive phrases. The advantages of this work could contribute to valuable endeavors, such as generating speech for individuals with aphasia or people with amyotrophic lateral sclerosis.
		<br><br>
		This page is for <b>research demonstration purposes</b> only.
      </p>
</div>
		
<div class="container pt-5 mt-5 shadow p-5 mb-5 bg-white rounded">		
<!--	<h2 id="model-overview" style="text-align: center;">Model Overview</h2>-->
	<body>
	<p style="text-align: center;">
		<img alt="Overview" src="pics/Overview.png" height="400" width="800">
	</p>
	</body>
		<p style="text-align: center;">
		VALL-E 2 achieves human parity zero-shot TTS performance for the first time.
		</p>
</div>



<div class="container pt-5 mt-5 shadow p-5 mb-5 bg-white rounded">
		<h2 id="Hard-Examples" style="text-align: center;">Hard Examples</h2>

			<p>
			VALL-E 2 can synthesize personalized speech even with the hard text from <a href="https://ereboas.github.io/ELLAV">ELLA-V</a>. The speaker prompts are sampled from the librispeech dataset.
			</p>

		<div class="table-responsive pt-3">
			<table class="table table-hover pt-2">
			<thead>
			<tr>
			<th style="text-align: center;vertical-align:middle">Text</th>
			<th style="text-align: center;vertical-align:middle">Speaker Prompt </th>
			<th style="text-align: center;vertical-align:middle">VALL-E </th>
			<th style="text-align: center;vertical-align:middle">VALL-E 2</th>
			</tr>
			</thead>
			<tbody>

			<tr>
			<td style="text-align: left;vertical-align:middle;width: 500px">F one F two F four F eight H sixteen H thirty two H sixty four</td>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/0/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/0/valle.0.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/0/valle.1.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/0/valle.2.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/0/valle2.0.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/0/valle2.1.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/0/valle2.2.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: left;vertical-align:middle;width: 500px">Clever cats carefully crafted colorful collages creating cheerful compositions</td>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/7/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/7/valle.0.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/7/valle.1.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/7/valle.2.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/7/valle2.0.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/7/valle2.1.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/7/valle2.2.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: left;vertical-align:middle;width: 500px">Curious koalas curiously climbed curious curious climbers</td>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/40/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/40/valle.0.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/40/valle.1.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/40/valle.2.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/40/valle2.0.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/40/valle2.1.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/40/valle2.2.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: left;vertical-align:middle;width: 500px">Sad snakes sadly sighed sad sad sighs</td>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/42/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/42/valle.0.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/42/valle.1.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/42/valle.2.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/42/valle2.0.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/42/valle2.1.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/42/valle2.2.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: left;vertical-align:middle;width: 500px">Joyful jaguars joyfully jumped joyful joyful jumps</td>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/46/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/46/valle.0.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/46/valle.1.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/46/valle.2.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/46/valle2.0.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/46/valle2.1.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/46/valle2.2.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: left;vertical-align:middle;width: 500px">Noisy newts nonsensically nibbled noisy noisy nibbles</td>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/48/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/48/valle.0.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/48/valle.1.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/48/valle.2.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/48/valle2.0.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/48/valle2.1.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/48/valle2.2.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: left;vertical-align:middle;width: 500px">Crafting a symphony of flavors the skilled chef orchestrated a culinary masterpiece that left an indelible mark mark mark mark mark on the palates of the discerning diners</td>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/67/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/67/valle.0.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/67/valle.1.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/67/valle.2.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/67/valle2.0.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/67/valle2.1.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/67/valle2.2.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: left;vertical-align:middle;width: 500px">The future belongs to belongs to belongs to belongs to belongs to those who believe in the beauty of the beauty of the beauty of the beauty of the beauty of their dreams</td>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/89/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/89/valle.0.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/89/valle.1.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/89/valle.2.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/89/valle2.0.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/89/valle2.1.wav" autoplay/>Your browser does not support the audio element.</audio>
				<audio controls="controls" style="width: 140px;"><source src="audios/hard_samples/89/valle2.2.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			</tbody>
			</table>
		</div>
</div>


<div class="container pt-5 mt-5 shadow p-5 mb-5 bg-white rounded">
	<h2 id="librispeech-samples" style="text-align: center;">LibriSpeech Samples</h2>
		<p>
		VALL-E 2 can perform zero-shot speech continuation with the first 3-second prefix as the speaker prompt, and speech synthesis with a reference utterance of an unseen speaker as the speaker prompt.  The audio and transcriptions are sampled from the librispeech dataset.
		</p>
		<div class="table-responsive pt-3">
			<table class="table table-hover pt-2">
			<thead>
			<tr>
			<th style="text-align: center;vertical-align:middle">Text</th>
			<th style="text-align: center;vertical-align:middle">Speaker Prompt (Prefix/Ref)</th>
			<th style="text-align: center;vertical-align:middle">VALL-E </th>
			<th style="text-align: center;vertical-align:middle">VALL-E 2 <br> (Group Size ×1)</th>
			<th style="text-align: center;vertical-align:middle">VALL-E 2 <br> (Group Size ×2)</th>
			<th style="text-align: center;vertical-align:middle">VALL-E 2 <br> (Group Size ×4)</th>
<!--			<th style="text-align: center;vertical-align:middle">VALL-E 2 <br> (Shift 8)</th>-->
			</tr>
			</thead>
			<tbody>
			<tr>
			<td rowspan="2" style="text-align: left;vertical-align:middle;width: 500px">They moved thereafter cautiously about the hut groping before and about them to find something to show that Warrenton had fulfilled his mission</td>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/librispeech/809/conti_infer/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/809/conti_infer/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/809/conti_infer/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/809/conti_infer/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/809/conti_infer/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/librispeech/809/cross_infer/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/809/cross_infer/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/809/cross_infer/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/809/cross_infer/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/809/cross_infer/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td rowspan="2" style="text-align: left;vertical-align:middle;width: 500px">And lay me down in thy cold bed and leave my shining lot</td>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1216/conti_infer/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1216/conti_infer/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1216/conti_infer/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1216/conti_infer/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1216/conti_infer/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1216/cross_infer/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>

			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1216/cross_infer/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1216/cross_infer/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1216/cross_infer/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1216/cross_infer/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td rowspan="2" style="text-align: left;vertical-align:middle;width: 500px">Number ten fresh nelly is waiting on you good night husband</td>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1/conti_infer/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1/conti_infer/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1/conti_infer/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1/conti_infer/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1/conti_infer/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1/cross_infer/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1/cross_infer/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1/cross_infer/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1/cross_infer/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/1/cross_infer/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td rowspan="2" style="text-align: left;vertical-align:middle;width: 500px">Yea his honourable worship is within but he hath a godly minister or two with him and likewise a leech</td>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/librispeech/74/conti_infer/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/74/conti_infer/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/74/conti_infer/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/74/conti_infer/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/74/conti_infer/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/librispeech/74/cross_infer/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/74/cross_infer/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/74/cross_infer/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/74/cross_infer/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/librispeech/74/cross_infer/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			</tbody>
			</table>
		</div>
</div>

<div class="container pt-5 mt-5 shadow p-5 mb-5 bg-white rounded">
	<h2 id="vctk-samples" style="text-align: center;">VCTK Samples</h2>
		<p>
		Zero-shot TTS from 3-second, 5-second and 10-second speaker prompts. The audio and transcriptions are sampled from the VCTK dataset.
		</p>
		<div class="table-responsive pt-3">
			<table class="table table-hover pt-2">
			<thead>
			<tr>
				<th style="text-align: center;vertical-align:middle">Text</th>
				<th style="text-align: center;vertical-align:middle">Speaker Prompt (3s/5s/10s)</th>
			<th style="text-align: center;vertical-align:middle">VALL-E </th>
			<th style="text-align: center;vertical-align:middle">VALL-E 2 <br>  (Group Size 1)</th>
			<th style="text-align: center;vertical-align:middle">VALL-E 2 <br>  (Group Size 2)</th>
			<th style="text-align: center;vertical-align:middle">VALL-E 2 <br>  (Group Size 4)</th>
			</tr>
			</thead>
			<tbody>
			<tr>
			<td rowspan="3" style="text-align: left;vertical-align:middle;width: 500px">We have to reduce the number of plastic bags</td>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/vctk/27/3s/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/27/3s/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/27/3s/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/27/3s/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/27/3s/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/vctk/27/5s/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/27/5s/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/27/5s/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/27/5s/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/27/5s/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/vctk/27/10s/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/27/10s/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/27/10s/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/27/10s/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/27/10s/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td rowspan="3" style="text-align: left;vertical-align:middle;width: 500px">So what is the campaign about</td>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/vctk/36/3s/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/36/3s/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/36/3s/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/36/3s/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/36/3s/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/vctk/36/5s/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/36/5s/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/36/5s/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/36/5s/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/36/5s/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/vctk/36/10s/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>

			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/36/10s/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/36/10s/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/36/10s/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/36/10s/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td rowspan="3" style="text-align: left;vertical-align:middle;width: 500px">My life has changed a lot</td>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/vctk/46/3s/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/46/3s/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/46/3s/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/46/3s/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/46/3s/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/vctk/46/5s/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/46/5s/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/46/5s/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/46/5s/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/46/5s/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/vctk/46/10s/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/46/10s/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/46/10s/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/46/10s/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/46/10s/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td rowspan="3" style="text-align: left;vertical-align:middle;width: 500px">Nothing is yet confirmed</td>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/vctk/28/3s/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/28/3s/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/28/3s/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/28/3s/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/28/3s/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/vctk/28/5s/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/28/5s/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/28/5s/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/28/5s/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/28/5s/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			<td style="text-align: center;vertical-align:middle"><audio controls="controls" style="width: 140px;"><source src="audios/vctk/28/10s/prompt.wav" autoplay/>Your browser does not support the audio element.</audio></td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/28/10s/valle.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/28/10s/valle2_shift1.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/28/10s/valle2_shift2.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			<td style="text-align: center;vertical-align:middle">
				<audio controls="controls" style="width: 140px;"><source src="audios/vctk/28/10s/valle2_shift4.vocos.0.wav" autoplay/>Your browser does not support the audio element.</audio>
			</td>
			</tr>
			<tr>
			</tbody>
			</table>
		</div>
</div>




<div class="container pt-5 mt-5 shadow p-5 mb-5 bg-white rounded">		
	<h2 id="Ethics-Statement" style="text-align: center;">Ethics Statement</h2>
	<p>
	Since VALL-E 2 could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker.
	We conducted the experiments under the assumption that the user agree to be the target speaker in speech synthesis. 
	If the model is generalized to unseen speakers in the real world, it should include a protocol to ensure that the speaker approves the use of their voice and a synthesized speech detection model.
	</p>
</div>
 
</article>
</main>
</div>

</body>
</html>
