Neural Synthesis of Binaural Speech From Mono Audio

Alexander Richard; Dejan Markovic; Israel D. Gebru; Steven Krenn; Gladstone Alexander Butler; Fernando Torre; Yaser Sheikh

Neural Synthesis of Binaural Speech From Mono Audio

Alexander Richard, Dejan Markovic, Israel D. Gebru, Steven Krenn, Gladstone Alexander Butler, Fernando Torre, Yaser Sheikh

Published: 12 Jan 2021, Last Modified: 05 May 2023ICLR 2021 OralReaders: Everyone

Keywords: binaural audio, sound spatialization, neural sound synthesis, binaural speech, speech processing, speech generation

Abstract: We present a neural rendering approach for binaural sound synthesis that can produce realistic and spatially accurate binaural sound in realtime. The network takes, as input, a single-channel audio source and synthesizes, as output, two-channel binaural sound, conditioned on the relative position and orientation of the listener with respect to the source. We investigate deficiencies of the l2-loss on raw waveforms in a theoretical analysis and introduce an improved loss that overcomes these limitations. In an empirical evaluation, we establish that our approach is the first to generate spatially accurate waveform outputs (as measured by real recordings) and outperforms existing approaches by a considerable margin, both quantitatively and in a perceptual study. Dataset and code are available online.

One-sentence Summary: We propose an end-to-end approach to neural binaural sound synthesis that for the first time outperforms DSP-based methods in a qualitative evaluation and in a perceptual study.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

11 Replies

Loading