Head-Related Transfer Function Upsampling With Spatial Extrapolation Features

Jiale Zhao, Dingding Yao, Junfeng Li

Published: 01 Jan 2025, Last Modified: 28 Feb 2026IEEE Transactions on Audio, Speech and Language ProcessingEveryoneRevisionsCC BY-SA 4.0
Abstract: Head-related transfer functions (HRTFs) with high spatial resolution play a crucial role in spatial audio rendering. As the direct way to obtain HRTFs, acoustic measurement is usually time-consuming and costly. Therefore, the alternative way is to upsample the low-resolution HRTFs, aiming to increase the spatial sampling density. However, the magnitudes and phases of HRTFs vary rapidly with changes in source positions, leading to increased upsampling errors in the existing methods when there are sparser measurements. To achieve upsampling of HRTFs with lower error, this study proposes a neural network-based model that incorporates a spatial extrapolation feature. This model consists of two components: an encoder that extracts the spatial extrapolation feature from sparse measurements to extrapolate the distribution of magnitudes and interaural time differences (ITDs) at high spatial resolution grids, and a separate network that predicts high-resolution spectra or ITDs from the spatial extrapolation feature. The spatial extrapolation feature in the model represents the relationship between HRTF features at low spatial resolution grids and those at high spatial resolution grids. To validate the preservation of the original input information and assess the limitation of the proposed method, a decoder that reconstructs the input measurements is additionally included. Compared with existing methods, the proposed model concentrates on increasing the sampling density without being distracted by the compression and reconstruction of HRTF features. The results of objective and subjective evaluations confirmed the superior performance of the proposed method with the measured and simulated HRTF datasets over the existing methods.
Loading