<!DOCTYPE html>
<html>
  <head>
    <title>Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models</title>
    <link href="style.css" rel="stylesheet">
    <script src="script.js" type="text/javascript"></script>
  </head>
  <body>
    <main>
      <div>
        <section id="style_eq">
          <h1 class="author">Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models </h1>
          <p>We propose <em>style equalization</em> --- to enable generative sequence models to control style and content separately in detail, without the need of any style label during training and inference. Our method enables many novel functionalities, including auto-completing/correcting handwriting, generating speech in different voices, generating missing training samples for a downstream recognizer, and analyzing the biases and failures of a recognizer.</p>
          <p>Our proposed method is generic and can be easily applied to various signal domain. In the following, we showcase our models on two different tasks --- handwriting and speech synthesis.</p>
        </section>
        <section id="speech_synthesis">
          <h2>Speech Synthesis</h2>
          <p>Given a reference speech audio, our model generates new audios that sound like they were recorded <em>in the original environment by the same speaker.</em> In other words, we mimic the voice characteristics of the speaker, the background noise, the echo, the microphone response, etc, but with our target content.</p>
          <p>In the video below, we type the content in the input text box (top row), use the slider to choose a random speech audio as the style reference input (middle row), and synthesize the input text with the style of the reference audio (bottom row). Please turn on your audio.</p><video src="assets/videos/speech.mp4" controls class="video_speech"></video>
          <p>As can be seen, our method accurately mimics the style of the reference example while producing the correct content.</p>
          <p><b>Here is a quick comparisons with global style token, which is also an unsupervised method. </b> <br>The goal is to read the input text in the same style (e.g., voice characteristics, background noise, echo, etc) as the style input.</p>
          <div>
            <p><em>Input text 1: I did not see any reason to change the captain. </em></p>
          </div>
          <table class="peek_table">
            <tr>
              <th class="text_header">style text</th>
              <th>style input</th>
              <th>global style token</th>
              <th>proposed</th>
            </tr>
            <tr>
              <th class="text">When the candle ends sent up their conical yellow flames, all the colored figures from Austria stood out clear and full of meaning against the green boughs.</th>
              <td><audio src="assets/libritts/nonparallel_text_unseen/text0/0/style_from.mp3" controls preload="metadata"></audio></td>
              <td><audio src="assets/libritts/nonparallel_text_unseen/text0/0/gst-192.mp3" controls preload="metadata"></audio></td>
              <td><audio src="assets/libritts/nonparallel_text_unseen/text0/0/proposed.mp3" controls preload="metadata"></audio></td>
            </tr>
            <tr>
              <th class="text">The man shrugged his broad shoulders and turned back into the arabesque chamber.</th>
              <td><audio src="assets/libritts/nonparallel_text_unseen/text0/1/style_from.mp3" controls preload="metadata"></audio></td>
              <td><audio src="assets/libritts/nonparallel_text_unseen/text0/1/gst-192.mp3" controls preload="metadata"></audio></td>
              <td><audio src="assets/libritts/nonparallel_text_unseen/text0/1/proposed.mp3" controls preload="metadata"></audio></td>
            </tr>
          </table>
          <div>
            <p><em>Input text 2: Next year it plans to open an office in Tokyo. </em></p>
          </div>
          <table class="peek_table">
            <tr>
              <th class="text_header">style text</th>
              <th>style input</th>
              <th>global style token</th>
              <th>proposed</th>
            </tr>
            <tr>
              <th class="text">I had meant it to be the story of my life, but how little of my life is in it!</th>
              <td><audio src="assets/libritts/nonparallel_text_unseen/text1/0/style_from.mp3" controls preload="metadata"></audio></td>
              <td><audio src="assets/libritts/nonparallel_text_unseen/text1/0/gst-192.mp3" controls preload="metadata"></audio></td>
              <td><audio src="assets/libritts/nonparallel_text_unseen/text1/0/proposed.mp3" controls preload="metadata"></audio></td>
            </tr>
            <tr>
              <th class="text">Every landscape, low and high, seems doomed to be trampled and harried.</th>
              <td><audio src="assets/libritts/nonparallel_text_unseen/text1/1/style_from.mp3" controls preload="metadata"></audio></td>
              <td><audio src="assets/libritts/nonparallel_text_unseen/text1/1/gst-192.mp3" controls preload="metadata"></audio></td>
              <td><audio src="assets/libritts/nonparallel_text_unseen/text1/1/proposed.mp3" controls preload="metadata"></audio></td>
            </tr>
          </table>
          <p></p>
          <div>
            <a href="_speech.html">Please click to see a detailed comparison.</a>
          </div>
        </section>
        <section id="handwriting_synthesis">
          <h2>Handwriting Synthesis</h2>
          <p>Given a reference handwriting, which comprises a sequence of pen movements, our model generates a new handwriting in the same writing style.</em></p>
          <p>In the video below, we type the content in the input text box (top row), use the slider to choose a random style (rasterized style handwriting is shown in parallel with the selection in the middle row) and synthesize the input content with the selected handwriting style (shown as a sequence of strokes in the bottom row).</p><video src="assets/videos/handwriting.mp4#t=38" controls loop></video>
          <p>*Due to privacy reasons, the style references used in this video are synthetic. They are close reproductions of unseen real styles using a generative model with a different architecture. The generations shown here are very similar when real samples are used as style input. Note that all the evaluations reported in the paper are done using real unseen style examples.</p>
          <div>
            <a href="_handwriting.html">Please click to see more handwriting examples.</a>
          </div>
        </section>
        <section id="intro">
          <h2>Quick introduction video</h2><video src="assets/videos/intro.mp4" controls class="video_speech"></video>
        </section>
      </div>
      <nav class="section-nav">
        <ol>
          <li>
            <a href="#style_eq">Style Equalization</a>
          </li>
          <li>
            <a href="#speech_synthesis">Speech Synthesis</a>
            <ul>
              <li>
                <a href="_speech.html#libritts_unseen_nonparallel">LibriTTS unseen speaker</a>
              </li>
              <li>
                <a href="_speech.html#libritts_unseen_ablation">LibriTTS ablation study</a>
              </li>
              <li>
                <a href="_speech.html#libritts_interpolation">LibriTTS unseen style interpolation</a>
              </li>
              <li>
                <a href="_speech.html#libritts_seen_nonparallel">LibriTTS seen speaker</a>
              </li>
              <li>
                <a href="_speech.html#libritts_prior">LibriTTS random styles from prior distribution</a>
              </li>
              <li>
                <a href="_speech.html#vctk_nonparallel">VCTK nonparallel text</a>
              </li>
              <li>
                <a href="_speech.html#vctk_parallel">VCTK parallel text</a>
              </li>
            </ul>
          </li>
          <li>
            <a href="#handwriting_synthesis">Handwriting Synthesis</a>
            <ul>
              <li>
                <a href="_handwriting.html#handwriting_nonparallel">Nonparallel text</a>
              </li>
              <li>
                <a href="_handwriting.html#handwriting_parallel">Parallel text</a>
              </li>
              <li>
                <a href="_handwriting.html#handwriting_prior">Random samples from prior</a>
              </li>
              <li>
                <a href="_handwriting.html#handwriting_interpolation">Style interpolation</a>
              </li>
            </ul>
          </li>
          <li>
            <a href="#intro">Quick introduction video</a>
          </li>
        </ol>
      </nav>
    </main>
  </body>
</html>