Abstract: Mean Opinion Score (MOS) prediction is the task to automatically evaluate synthesized speech by a neural network that emulates a human listening test. Traditional automatic MOS prediction typically focused on mainstream languages, such as English, due to large available data. However, for low-resource languages, there is no large-scale MOS prediction data, that hinders the study of those languages. In this paper, we propose a novel Multi-Perspective Transfer Learning (MPTL) training scheme with a new small-scale Mongolian MOS prediction dataset MonMOS. MPTL includes Feature Transfer and Model Transfer to transfer knowledge from the mainstream languages to low-resource language from different perspectives. The experimental results on the MonMOS show that the MPTL outperforms the standard direct training scheme with classical architecture. We will release the pre-trained models and MonMOS dataset at: https://github.com/Ai-S2-Lab/MPTL-MOS.
Loading