Keywords: embodied learning, audio-visual learning, visual acoustic learning, acoustic simulation, sim2real
TL;DR: We are releasing SoundSpaces 2.0: a fast, continuous, configurable and generalizable audio-visual simulation platform for visual acoustic machine learning research, e.g., audio-visual navigation, far-field speech recognition, and acoustic matching.
Abstract: We introduce SoundSpaces 2.0, a platform for on-the-fly geometry-based audio rendering for 3D environments. Given a 3D mesh of a real-world environment, SoundSpaces can generate highly realistic acoustics for arbitrary sounds captured from arbitrary microphone locations. Together with existing 3D visual assets, it supports an array of audio-visual research tasks, such as audio-visual navigation, mapping, source localization and separation, and acoustic matching. Compared to existing resources, SoundSpaces 2.0 has the advantages of allowing continuous spatial sampling, generalization to novel environments, and configurable microphone and material properties. To our knowledge, this is the first geometry-based acoustic simulation that offers high fidelity and realism while also being fast enough to use for embodied learning. We showcase the simulator's properties and benchmark its performance against real-world audio measurements. In addition, we demonstrate two downstream tasks---embodied navigation and far-field automatic speech recognition---and highlight sim2real performance for the latter. SoundSpaces 2.0 is publicly available to facilitate wider research for perceptual systems that can both see and hear.
Supplementary Material: zip
Contribution Process Agreement: Yes
In Person Attendance: Yes
Dataset Url: The binaries of RLR-Audio-Propagation are released at https://github.com/facebookresearch/rlr-audio-propagation. The integration with Habitat-Sim is available at https://github.com/facebookresearch/habitat-sim/blob/main/docs/AUDIO.md. The high-level APIs for tasks and training scripts are available at https://github.com/facebookresearch/sound-spaces.
License: License for SoundSpaces: CC-BY-4.0 License for RLR-Audio-Propagation: CC-BY-NC License for Habitat-Sim: MIT
Author Statement: Yes
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/arxiv:2206.08312/code)