<!doctype html>
<html lang="en">
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><link href="./Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation_files/chalkduster" rel="stylesheet">
<style>
	@import url('https://fonts.cdnfonts.com/css/chalkduster');
</style>
<head>
	<!-- Required meta tags -->
	<meta charset="utf-8">
	<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">

	<!-- Bootstrap CSS -->
	<link href="./resources/bootstrap.min.css" rel="stylesheet">
	<link href="./resources/stylesheet.css" rel="stylesheet">

	<title>Supplementary Website</title>
</head>


<body data-new-gr-c-s-check-loaded="14.1110.0" data-gr-ext-installed="">

	<section class="jumbotron text-center pb-2">
		<div class="container">
			<h1 class="jumbotron-heading">Weakly Supervised Motion Learning for <br> Co-speech Gesture Video Generation</h1>

			<h4 class="font-italic pt-2" style="font-weight: normal">ICLR 2026 [Submission ID: 7934]</h4>

		</div>
	</section>

	<div class="container">


		<div class="row pt-3 text-center">
			  


		</div>		
		<div class="row justify-content-sm-center">
			  

			<a class="sm-1 mx-1 btn btn-primary" href="./comparisons.html" role="button">Videos for Comparisons</a>
			<a class="sm-1 mx-1 btn btn-primary" href="./ablations.html" role="button">Videos for Ablation Studies</a>
			<a class="sm-1 mx-1 btn btn-primary" href="./indentities.html" role="button">Videos for Other Identities</a>
			<a class="sm-1 mx-1 btn btn-primary" href="./long.html" role="button">Long Videos</a>

		</div>		




			<hr class="mt-5">


			
			<h2 class="pt-4"><p class="text-center">Videos for Ablation Studies</p></h2>

			<p class="lead">On this page, we present videos from different stages of our framework, illustrating its progressive improvements. Stages 1 and 2 effectively capture motion representations but lack fine-grained visual details. Stage 3 refines these details; however, hand artifacts remain. With the addition of hand refinement, our complete method produces visually coherent and artifact-free videos. Moreover, we observe that as hand quality improves, overall visual quality—including facial detail—also benefits. This suggests a strong correlation between hand realism and the overall perceptual quality of generated videos.
				<br><br>
				Additionally, we present results from Stage 2 without the invertible feature extractor, demonstrating its necessity in our framework. We also provide results where generation relies solely on audio, further emphasizing the limitations of audio-only video synthesis. These experiments validate the effectiveness of our method by underscoring the importance of motion representation learning. 
			</p>

			<table width="1200" style="margin-left: -55px;" align="center">
				<tbody>

						<tr>
						
						
						<th style="text-align: center; padding: 10px;">

							<div style="width: 1000px; margin: auto; border-bottom: 1px solid #000;">
								<div class="row justify-content-sm-center" style="display: flex; width: 1000px; height: 40px; margin-left: 0px">
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Stage 1</p>
									</div>
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Stage 2</p>
									</div>
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Stage 2 W/o IFE</p>
									</div>
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Stage 3</p>
									</div>
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Ours</p>
									</div>
								</div>
								<video autoplay="autoplay" controls="controls" loop="loop" muted="muted" src="ablation/stages/oliver_stack.mp4" style="width:1000px; border: 5px solid #000;" type="video/mp4"></video>
							</div>
						</th>
						</tr>
						
						
						<th style="text-align: center; padding: 10px;">

							<div style="width: 1000px; margin: auto; border-bottom: 1px solid #000;">
								<div class="row justify-content-sm-center" style="display: flex; width: 1000px; height: 40px; margin-left: 0px">
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Stage 1</p>
									</div>
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Stage 2</p>
									</div>
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Stage 2 W/o IFE</p>
									</div>
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Stage 3</p>
									</div>
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Ours</p>
									</div>
								</div>
								<video autoplay="autoplay" controls="controls" loop="loop" muted="muted" src="ablation/stages/oliver1_stack.mp4" style="width:1000px; border: 5px solid #000;" type="video/mp4"></video>
							</div>
						</th>
						</tr>
						
						
						<th style="text-align: center; padding: 10px;">

							<div style="width: 1000px; margin: auto; border-bottom: 1px solid #000;">
								<div class="row justify-content-sm-center" style="display: flex; width: 1000px; height: 40px; margin-left: 0px">
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Stage 1</p>
									</div>
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Stage 2</p>
									</div>
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Stage 2 W/o IFE</p>
									</div>
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Stage 3</p>
									</div>
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Ours</p>
									</div>
								</div>
								<video autoplay="autoplay" controls="controls" loop="loop" muted="muted" src="ablation/stages/seth_stack.mp4" style="width:1000px; border: 5px solid #000;" type="video/mp4"></video>
							</div>
						</th>
						</tr>
						
						
						<th style="text-align: center; padding: 10px;">

							<div style="width: 1000px; margin: auto; border-bottom: 1px solid #000;">
								<div class="row justify-content-sm-center" style="display: flex; width: 1000px; height: 40px; margin-left: 0px">
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Stage 1</p>
									</div>
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Stage 2</p>
									</div>
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Stage 2 W/o IFE</p>
									</div>
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Stage 3</p>
									</div>
									<div style="flex: 0 0 20%; text-align: center;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Ours</p>
									</div>
								</div>
								<video autoplay="autoplay" controls="controls" loop="loop" muted="muted" src="ablation/stages/noah_stack.mp4" style="width:1000px; border: 5px solid #000;" type="video/mp4"></video>
							</div>
						</th>
						</tr>

						<th style="text-align: center; padding: 10px;">
							<div style="width: 800px; margin: auto; border-bottom: 1px solid #000;">
								<div class="row justify-content-sm-center" style="display: flex; width: 800px; height: 40px; margin-left: 0px;">
									<div style="flex: 0 0 45%; text-align: center; margin-right: 10px;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">W/o Motion Information</p>
									</div>
									<div style="flex: 0 0 45%; text-align: center; margin-left: 10px;">
										<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Ours</p>
									</div>
								</div>
								<video autoplay="autoplay" controls="controls" loop="loop" muted="muted" src="ablation/audio/1_stack.mp4" style="width: 800px; border: 5px solid #000;" type="video/mp4"></video>
							</div>
						</th>
					</tr>
						
						
					<th style="text-align: center; padding: 10px;">
						<div style="width: 800px; margin: auto; border-bottom: 1px solid #000;">
							<div class="row justify-content-sm-center" style="display: flex; width: 800px; height: 40px; margin-left: 0px;">
								<div style="flex: 0 0 45%; text-align: center; margin-right: 10px;">
									<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">W/o Motion Information</p>
								</div>
								<div style="flex: 0 0 45%; text-align: center; margin-left: 10px;">
									<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Ours</p>
								</div>
							</div>
							<video autoplay="autoplay" controls="controls" loop="loop" muted="muted" src="ablation/audio/10_stack.mp4" style="width: 800px; border: 5px solid #000;" type="video/mp4"></video>
						</div>
					</th>
				</tr>
				<th style="text-align: center; padding: 10px;">
					<div style="width: 800px; margin: auto; border-bottom: 1px solid #000;">
						<div class="row justify-content-sm-center" style="display: flex; width: 800px; height: 40px; margin-left: 0px;">
							<div style="flex: 0 0 45%; text-align: center; margin-right: 10px;">
								<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">W/o Motion Information</p>
							</div>
							<div style="flex: 0 0 45%; text-align: center; margin-left: 10px;">
								<p style="font-family: Chalkduster; font-size: 16px; margin: 0;">Ours</p>
							</div>
						</div>
						<video autoplay="autoplay" controls="controls" loop="loop" muted="muted" src="ablation/audio/11_stack.mp4" style="width: 800px; border: 5px solid #000;" type="video/mp4"></video>
					</div>
				</th>
			</tr>
						
	
				</tbody>
			</table>

		</div>

		<script src="./resources/jquery-3.4.1.slim.min.js"></script>
		<script src="./resources/popper.min.js"></script>
		<script src="./resources/bootstrap.min.js"></script>

	</body>


</body><grammarly-desktop-integration data-grammarly-shadow-root="true"><template shadowrootmode="open"><style>
	div.grammarly-desktop-integration {
	  position: absolute;
	  width: 1px;
	  height: 1px;
	  padding: 0;
	  margin: -1px;
	  overflow: hidden;
	  clip: rect(0, 0, 0, 0);
	  white-space: nowrap;
	  border: 0;
	  -moz-user-select: none;
	  -webkit-user-select: none;
	  -ms-user-select:none;
	  user-select:none;
	}
  
	div.grammarly-desktop-integration:before {
	  content: attr(data-content);
	}
  </style><div aria-label="grammarly-integration" role="group" tabindex="-1" class="grammarly-desktop-integration" data-content="{&quot;mode&quot;:&quot;full&quot;,&quot;isActive&quot;:true,&quot;isUserDisabled&quot;:false}"></div></template></grammarly-desktop-integration></html>

	</html>
