FT-CSR: Cascaded Frequency-Time Method for Coded Speech Restoration

Liang Wen, Lizhong Wang, Yuxing Zheng, Weijing Shi, Kwang Pyo Choi

Published: 01 Jan 2024, Last Modified: 12 May 2025ICME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Lossy speech codecs often introduce coding distortions such as coding noise and constrained bandwidth, which can affect the quality of the decoded speech. This paper proposes a method called FT-CSR, which is used for coded speech restoration. FT-CSR reduces coding noise and recovers missing frequencies sequentially using a cascaded frequency-time domain model. In experiments using the Opus codec, FT-CSR was found to be effective across bitrates ranging from 8 to 16 kbps and outperformed the baseline on both objective and subjective measurements. FT-CSR achieves a MOS-POLQA score of 3.6 or higher and improves MOS-POLQA by more than 0.23 when compared to decoded speech. The results of the subjective test show that FT-CSR can improve MOS by over 0.85 for decoded speech.