Synthnet: Learning synthesizers end-to-end

Florin Schimbinschi; Christian Walder; Sarah Erfani; James Bailey

Synthnet: Learning synthesizers end-to-end

Florin Schimbinschi, Christian Walder, Sarah Erfani, James Bailey

27 Sept 2018 (modified: 05 May 2023)ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: Learning synthesizers and generating music in the raw audio domain is a challenging task. We investigate the learned representations of convolutional autoregressive generative models. Consequently, we show that mappings between musical notes and the harmonic style (instrument timbre) can be learned based on the raw audio music recording and the musical score (in binary piano roll format). Our proposed architecture, SynthNet uses minimal training data (9 minutes), is substantially better in quality and converges 6 times faster than the baselines. The quality of the generated waveforms (generation accuracy) is sufficiently high that they are almost identical to the ground truth. Therefore, we are able to directly measure generation error during training, based on the RMSE of the Constant-Q transform. Mean opinion scores are also provided. We validate our work using 7 distinct harmonic styles and also provide visualizations and links to all generated audio.

Keywords: audio, synthesizers, music, convolutional neural networks, generative models, autoregressive models

TL;DR: A convolutional autoregressive generative model that generates high fidelity audio, behchmarked on music

25 Replies

Loading