An Exploration on Singing MOS Prediction

Published: 2024, Last Modified: 08 Jan 2026ISCSLP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Mean Opinion Score (MOS) evaluations are not widely applicable in the research process due to their time-consuming and costly nature. In the field of speech, the rapid development of MOS prediction techniques has enabled the use of pseudo MOS scores for evaluating synthesized audio. However, in singing domain, there remains a paucity of sufficient datasets and relevant research. Consequently, we conduct an investigation into singing MOS prediction using the SingMOS dataset. By systematically examining the impact of different loss functions and various SSL models, we identify the most effective SSL-based backbone for singing MOS prediction. Furthermore, we investigate to incorporate additional features, such as pitch, pitch vari-ance’ and judge, into our model. This comprehensive exploration leads to the development of an optimized MOS predictor tailored specifically for singing MOS prediction tasks, providing a valuable reference for further research.
Loading