Abstract: This study investigates the implication of utilizing the psychoacoustic model (PAM) within the neural audio coder (NAC), specifically focusing on the masking of quantization noise. We introduce a novel training strategy to incorporate the PAM into the NAC more accurately. This method involves a discriminator that directly or indirectly measures the PAM loss. For the indirect measurement, a multi-scale STFT discriminator (MS-STFTD) is incorporated to introduce an auxiliary loss term in addition to the existing PAM loss. Conversely, for the direct measurement, we have designed a multi-scale PAM discriminator (MS-PAMD) that quantifies PAM-specific parameters. Experimental results show that adding the discriminator masks the quantization noise better than the previous NAC, and it obtains audio quality comparable to the commercial AAC in both objective and subjective scores.
Loading