Abstract: Audio super-resolution aims to improve the quality of acoustic signals and is able to reconstruct corresponding high-resolution acoustic signals from low-resolution acoustic signals. However, since acoustic signals can be divided into two forms: time-domain acoustic waves or frequency-domain spectrograms, most existing research focuses on data enhancement in a single field, which can only obtain partial or local features of the audio signal, resulting in limitations of data analysis. Therefore, this paper proposes a time-frequency domain fusion enhanced audio super-resolution method to mine the complementarity of the two representations of acoustic signals. Specifically, we propose an end-to-end audio super-resolution network. Including the variational autoencoder based sound wave super-resolution module, U-Net-based Spectrogram Super-Resolution Module, and attention-based Time-Frequency Domain Fusion Module. The first two modules can generate more high-frequency and low-frequency components for audio respectively. As a critical component of our method, time-frequency domain fusion module performs weighted fusion on the above two outputs to obtain a super-resolution audio signal. Compared with other methods, experimental results on the VCTK and Piano datasets in natural scenes show that the time-frequency domain fusion audio super-resolution model has a state-of-the-art bandwidth expansion effect. Furthermore, we perform super-resolution on the ShipsEar dataset containing underwater acoustic signals. The super-resolution results are used to test ship target recognition, and and the accuracy is improved by 12.66%. Therefore, the proposed super-resolution method has excellent signal enhancement effect and generalization ability.
Loading