Abstract: This paper introduces a novel neural network model for rendering binaural audio directly from ambisonic recordings. We optimized the model end-to-end to learn a direct mapping between ambisonic and binaural signals. Our approach eliminates traditional processing steps that were required to mitigate artifacts due to spherical harmonic order truncation and spatial aliasing, as well as other complex filtering needed to compensate for near-field sound sources. To showcase the advantage of neural network-based rendering over traditional signal processing approaches, we introduce a new dataset that includes challenging near-field sound sources, including speech and background noises. We demonstrate that our model can produce binaural audio results that closely match the fidelity of ground truth binaural recordings. Our comprehensive validation shows that the proposed method outperforms existing methods on several error metrics as well as in subjective evaluations. Model code, demos and datasets are available on our project webpage.
External IDs:dblp:conf/icassp/GebruKSKMBHR25
Loading