<html>
<head>

<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css"
      integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
<link href='http://fonts.googleapis.com/css?family=Lato:300,400,900' rel='stylesheet' type='text/css'>
<link href="style.css" rel="stylesheet">
<title>Masked Audio Generative Modeling</title>
<style>
    audio {
    width: 110px;
    }

    audio::-webkit-media-controls-volume-slider {
    display: none !important;
    }

    audio::-webkit-media-controls-timeline-container {
    display: none !important;
    }

    audio::-webkit-media-controls-time-remaining-display {
    display: none !important;
    }

    audio::-webkit-media-controls-timeline {
    display: none !important;
    }
</style>
</head>
<body>

    <div id="header" class="container-fluid">
        <div class="row" style="text-align: center;"/>
        <div class="row">
            <h1>Masked Audio Generative Modeling</h1>
        </div>
    </div>

<div class="container">
    <h2>Abstract</h2>
    <p>
        We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of discrete audio representation, i.e., tokens. 
        Unlike prior work, MAGNeT is comprised of a single-stage, non-autoregressive transformer encoder. During training, we predict spans of masked tokens obtained from the masking scheduler, while during inference we gradually construct
        the output sequence using several decoding steps. To further enhance the quality of the generated audio, we introduce a novel model rescorer method. In which, we leverage an external pre-trained model to rescore and rank predictions from
        MAGNeT which will be then used for later decoding steps. Lastly, we explore a hybrid version of MAGNeT, in which we fuse between autoregressive and non-autoregressive models to generate the first few seconds in an autoregressive manner while the rest of the sequence is being decoded in parallel. We demonstrate the
        efficiency of MAGNeT over the task of text-to-music generation and conduct extensive empirical evaluation, considering both automatic and human studies. We show the proposed approach is comparable to the evaluated baselines while being
        significantly faster (x7 faster than the autoregressive baseline). Through ablation studies and analysis, we shed light on the importance of each of the components comprising MAGNeT, together with pointing to the trade-offs between autoregressive and non-autoregressive considering latency, throughput, and generation 
        quality. Samples are available as part of the supplemental material
    </p>
</div>

<div class="container" id="abstractdiv">
    <div class="row">
        <h2>Text-to-Music</h2>
        <p>In the following, we present samples for MAGNeT <a href="https://ai.honu.io/papers/musicgen/">MusicGen</a>,
            <a href="https://google-research.github.io/seanet/musiclm/examples/">MusicLM</a>,
            using the public <a href="https://aitestkitchen.withgoogle.com/experiments/music-lm">AI Test Kitchen demo</a>,
            <a href="https://huggingface.co/spaces/haoheliu/audioldm2-text2audio-text2music">AudioLDM2</a>, 
            and <a href="https://github.com/archinetai/audio-diffusion-pytorch">Mousai</a>, which we retrained
            on the same dataset as MAGNeT.
        </p>
        <table class="table table-responsive" width="100%">
            <thead>
            <tr class="text-center">
                <td>desc</td>
                <td><b>MAGNeT</b></td>
                <td>MusicGen</td>
                <td>MusicLM</td>
                <td>AudioLDM2</td>
                <td>Mousai</td>
            </tr>
            </thead>
            <tbody>
            <tr>
                <td class="desc">Earthy tones, environmentally conscious, ukulele-infused, harmonic, breezy, easygoing, organic instrumentation, gentle grooves</td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/MAGNeT-large/3_earthy tones, environmentally conscious, ukulele-infused, harmonic, breezy, easygoing, organic instrumentation, gentle grooves.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/musicgen_2.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/musiclm_2.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/audioldm2_2_m.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/mousai_2.mp3" type="audio/mpeg"></audio></td>
            </tr>
            <tr>
                <td class="desc">80s electronic track with melodic synthesizers, catchy beat and groovy bass</td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/MAGNeT-large/8_80s electronic track with melodic synthesizers, catchy beat and groovy bass.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/musicgen_3.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/musiclm_3.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/audioldm2_3_m.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/mousai_3.mp3" type="audio/mpeg"></audio></td>
            </tr>
            <tr>
                <td class="desc">Smooth jazz, with a saxophone solo, piano chords, and snare full drums</td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/MAGNeT-large/11_smooth jazz, with a saxophone solo, piano chords, and snare full drums.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/musicgen_4.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/musiclm_4.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/audioldm2_4_m.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/mousai_4.mp3" type="audio/mpeg"></audio></td>
            </tr>
            <tr>
                <td class="desc">A grand orchestral arrangement with thunderous percussion, epic brass fanfares, and soaring strings, creating a cinematic atmosphere fit for a heroic battle</td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/MAGNeT-large/1_A grand orchestral arrangement with thunderous percussion, epic brass fanfares, and soaring strings, creating a cinematic atmosphere fit for a heroic battle..mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/musicgen_1.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/musiclm_1.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/audioldm2_1_m.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/mousai_1.mp3" type="audio/mpeg"></audio></td>
            </tr>
            <tr>
                <td class="desc">Rock with saturated guitars, a heavy bass line and crazy drum break and fills</td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/MAGNeT-large/15_rock with saturated guitars, a heavy bass line and crazy drum break and fills..mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/musicgen_5.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/musiclm_5.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/audioldm2_5_m.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/mousai_5.mp3" type="audio/mpeg"></audio></td>
            </tr>
        </table>
    </div>
    <div class="row">
        <h2>Text-to-Audio</h2>
        <p>In the following, we present samples for MAGNeT <a href="https://felixkreuk.github.io/audiogen/">AudioGen</a>, and <a href="https://huggingface.co/spaces/haoheliu/audioldm2-text2audio-text2music">AudioLDM2</a>.
        </p>
        <table class="table table-responsive" width="100%">
            <thead>
            <tr class="text-center">
                <td>desc</td>
                <td><b>MAGNeT</b></td>
                <td>AudioGen</td>
                <td>AudioLDM2</td>
            </tr>
            </thead>
            <tbody>
            <tr>
                <td class="desc">Whistling with wind blowing</td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/AudioMAGNeT-large/1_whistling with wind blowing.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/audiogen_1.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/audioldm2_1_a.mp3" type="audio/mpeg"></audio></td>
            </tr>
            <tr>
                <td class="desc">A toilet flushing as music is playing and a man is singing in the distance</td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/AudioMAGNeT-large/a toilet flushing as music is playing and a man is singing in the distance.wav" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/audiogen_2.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/audioldm2_2_a.mp3" type="audio/mpeg"></audio></td>
            </tr>
            <tr>
                <td class="desc">Pigeons are making grunting sounds and snapping beaks</td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/AudioMAGNeT-large/pigeons are making grunting sounds and snapping beaks.wav" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/audiogen_3.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/audioldm2_3_a.mp3" type="audio/mpeg"></audio></td>
            </tr>
            <tr>
                <td class="desc">Seagulls squawking as ocean waves crash while wind blows heavily into a microphone</td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/AudioMAGNeT-large/seagulls squawking as ocean waves crash while wind blows heavily into a microphone.wav" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/audiogen_4.mp3" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/audioldm2_4_a.mp3" type="audio/mpeg"></audio></td>
            </tr>
        </table>
    </div>
    <div class="row">
        <h2>Hybrid-MAGNeT</h2>
        <p>We present samples of Hybrid-MAGNeT where the first 5-seconds were generated using an autoregressive mode, while the rest were generated in a non-autoregressive manner.</p>
        <table class="table table-responsive">
            <thead>
            <tr class="text-center">
                <td>desc</td>
                <td><b>Hybrid-MAGNeT</b></td>
            </tr>
            </thead>
            <tbody>
            <tr>
                <td class="desc">Hypnotic and bouncy, with hip hop trap elements featuring trippy synthesizer and synth drums to create a content and chill mood</td>
                <td class="text-center"><audio controls class="long_audio"><source src="samples/hybrid/Hypnotic and bouncy, with hip hop trap elements featuring trippy synthesizer and synth drums to create a content and chill mood..wav" type="audio/mp3"></audio></td>
            </tr>
            <tr>
                <td class="desc">Funky and confident, featuring groovy electric guitar, keyboards that create a chill, laid-back mood</td>
                <td class="text-center"><audio controls class="long_audio"><source src="samples/hybrid/Funky and confident, featuring groovy electric guitar, keyboards that create a chill, laid-back mood.wav" type="audio/mp3"></audio></td>
            </tr>
            <tr>
                <td class="desc">Heavy, hard and driving, in the style of Pop Punk, featuring edgy electric guitar that creates a bold, rebellious mood</td>
                <td class="text-center"><audio controls class="long_audio"><source src="samples/hybrid/Heavy, hard and driving, in the style of Pop Punk, featuring edgy electric guitar that creates a bold, rebellious mood.wav" type="audio/mp3"></audio></td>
            </tr>
            <tr>
                <td class="desc">Contemporary Jazz Waltz featuring a fabulous guitar solo</td>
                <td class="text-center"><audio controls class="long_audio"><source src="samples/hybrid/Contemporary Jazz Waltz featuring a fabulous guitar solo.wav" type="audio/mp3"></audio></td>
            </tr>
            <tr>
                <td class="desc">Bright and groovy, featuring a Tropical House feel and warm synth textures that create an enthusiastic mood.</td>
                <td class="text-center"><audio controls class="long_audio"><source src="samples/hybrid/Bright and groovy, featuring a Tropical House feel and warm synth textures that create an enthusiastic mood.wav" type="audio/mp3"></audio></td>
            </tr>
            </tbody>
        </table>
    </div>
    
    <div class="row">
        <h2>Restricted Temporal Context - Analysis</h2>
        <p>We present 10-second samples from MAGNeT trained with and without the temporal context restriction as defined in our paper.</p>
        <table class="table table-responsive">

            <thead>
            <tr class="text-center">
                <td>desc</td>
                <td>MAGNeT w.o. restricted context</b></td>
                <td><b>MAGNeT</td>
            </tr>
            </thead>
            <tbody>
            <tr>
                <td class="desc">House track with pads and synths creating a tripping harmony</td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/without_restricted_context/1_House track with pads and synths creating a tripping harmony.wav" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/with_restricted_context/1_House track with pads and synths creating a tripping harmony.wav" type="audio/mpeg"></audio></td>
            </tr>
            <tr>
                <td class="desc">House track with pads and synths creating a tripping harmony</td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/without_restricted_context/2_House track with pads and synths creating a tripping harmony.wav" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/with_restricted_context/2_House track with pads and synths creating a tripping harmony.wav" type="audio/mpeg"></audio></td>
            </tr>
            <tr>
                <td class="desc">House track with pads and synths creating a tripping harmony</td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/without_restricted_context/3_House track with pads and synths creating a tripping harmony.wav" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/with_restricted_context/3_House track with pads and synths creating a tripping harmony.wav" type="audio/mpeg"></audio></td>
            </tr>
            <tr>
                <td class="desc">Funky groove with electric piano playing blue chords rhythmically</td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/without_restricted_context/4_Funky groove with electric piano playing blue chords rhythmically.wav" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/with_restricted_context/4_Funky groove with electric piano playing blue chords rhythmically.wav" type="audio/mpeg"></audio></td>
            </tr>
            <tr>
                <td class="desc">Funky groove with electric piano playing blue chords rhythmically</td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/without_restricted_context/5_Funky groove with electric piano playing blue chords rhythmically.wav" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/with_restricted_context/5_Funky groove with electric piano playing blue chords rhythmically.wav" type="audio/mpeg"></audio></td>
            </tr>
            <tr>
                <td class="desc">Funky groove with electric piano playing blue chords rhythmically</td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/without_restricted_context/6_Funky groove with electric piano playing blue chords rhythmically.wav" type="audio/mpeg"></audio></td>
                <td class="text-center"><audio controls class="sample_audio"><source src="samples/with_restricted_context/6_Funky groove with electric piano playing blue chords rhythmically.wav" type="audio/mpeg"></audio></td>
            </tr>
        </table>
    </div>

</div>

<script>
function setupCallback(elem, elems) {
  elem.addEventListener("play", function () {
    for (var other of elems) {
      if (other !== elem) {
        other.pause();
        // other.currentTime = 0.;
      }
    }
  });
}

document.addEventListener('DOMContentLoaded', function () {
  var elems = document.body.getElementsByTagName("audio");
  for (var elem of elems) {
    setupCallback(elem, elems);
  }
});
</script>
</body>
</html>

