Abstract: Audio super-resolution aims to improve the quality of acoustic signals and is able to reconstruct corresponding high-resolution acoustic signals from low-resolution acoustic signals. However, since acoustic signals can be divided into two forms: time-domain acoustic waves or frequency-domain spectrograms, most existing research focuses on data enhancement in a single field, which can only obtain partial or local features of the audio signal, resulting in limitations of data analysis. Therefore, this paper proposes a time-frequency domain fusion enhanced audio super-resolution method to mine the complementarity of the two representations of acoustic signals. Specifically, we propose an end-to-end audio super-resolution network. Including the variational autoencoder based sound wave super-resolution module (SWSRM), U-Net-based Spectrogram Super-Resolution Module (SSRM), and attention-based Time-Frequency Domain Fusion Module (TFDFM). SWSRM and SSRM can generate more high-frequency and low-frequency components for audio respectively. As a critical component of our method, TFDFM performs weighted fusion on the above two outputs to obtain a super-resolution audio signal. Compared with other methods, experimental results on the VCTK and Piano datasets in natural scenes show that the time-frequency domain fusion audio super-resolution model has a state-of-the-art bandwidth expansion effect. Furthermore, we perform super-resolution on the ShipsEar dataset containing underwater acoustic signals. The super-resolution results are used to test ship target recognition, and and the accuracy is improved by 12.66%. Therefore, the proposed super-resolution method has excellent signal enhancement effect and generalization ability.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Generation] Generative Multimedia, [Experience] Interactions and Quality of Experience
Relevance To Conference: Audio super-resolution is closely related to the field of multimedia. In multimedia applications, high-quality audio is one of the key components. Through audio super-resolution technology, the clarity and fidelity of audio can be improved, thereby enhancing the quality and appeal of multimedia content. Different from previous single-domain audio super-resolution methods, this paper proposes an audio super-resolution method that combines time domain and frequency domain. By fully utilizing the correlation between the time domain and frequency domain of acoustic signals and enhancing the complementarity of low-frequency and high-frequency components, super-resolution audio can provide a clearer and more realistic audio experience. Our method can bring a more immersive listening experience to multimedia applications such as music, movies, and games. It can also inject new vitality and innovation into the multimedia field and promote the continuous development and progress of multimedia technology.
Supplementary Material: zip
Submission Number: 4258
Loading