Abstract: Estimating room impulse responses (RIRs) in real spaces is a time-consuming and expensive process requiring multiple pieces of equipment, recordings, and processing. A simple computer-vision-based method from a single 360° photo is proposed to estimate the acoustic material properties of the space by reconstructing an approximated 3D geometry. A 3D semantic geometry model is reconstructed from a 360° image by monocular depth estimation and semantic scene completion. The material properties of semantic objects in the scene are estimated using the transformer-based dense material segmentation method. This model is used to simulate a 3D acoustic room model on the Unity platform with Steam spatial audio plug-in. Acoustic properties of the space are estimated from this virtual reproduction and evaluated against the actual ones in the real environment.
0 Replies
Loading