Audio Super-Resolution using Neural Networks

Volodymyr Kuleshov, S. Zayd Enam, Stefano Ermon

Feb 17, 2017 (modified: Mar 02, 2017) ICLR 2017 workshop submission readers: everyone
  • Abstract: We propose a neural network-based technique for enhancing the quality of audio signals such as speech or music by transforming inputs encoded at low sampling rates into higher-quality signals with an increased resolution in the time domain. This amounts to generating the missing samples within the low-resolution signal in a process akin to image super-resolution. On standard speech and music datasets, this approach outperforms baselines at 2x, 4x, and 6x upscaling ratios. The method has practical applications in telephony, compression, and text-to-speech generation; it can also be used to improve the scalability of recently-proposed generative models of audio.
  • Conflicts: mcgill.ca, cornell.edu, berkeley.edu

Loading