Subjective Evaluation of Text-to-Speech Models: Comparing Absolute Category Rating and Ranking by Elimination Tests

Kishor Kayyar; Christian Dittmar; Nicola Pia; Emanuel Habets

Subjective Evaluation of Text-to-Speech Models: Comparing Absolute Category Rating and Ranking by Elimination Tests

Kishor Kayyar, Christian Dittmar, Nicola Pia, Emanuel Habets

Published: 15 Jun 2023, Last Modified: 29 Jun 2023SSW12Readers: Everyone

Keywords: Speech Synthesis, text-to-speech, mean-opinion-score, ranking-by-elimination, subjective test

TL;DR: We conduct a ranking-by-elimination test to determine the effectiveness of absolute category rating tests for mean opinion scores when evaluating speech synthesis models that are perceptually similar.

Abstract: Modern text-to-speech (TTS) models are typically subjectively evaluated using an Absolute Category Rating (ACR) method. This method uses the mean opinion score to rate each model under test. However, if the models are perceptually too similar, assigning absolute ratings to stimuli might be difficult and prone to subjective preference errors. Pairwise comparison tests offer relative comparison and capture some of the subtle differences between the stimuli better. However, pairwise comparisons take more time as the number of tests increases exponentially with the number of models. Alternatively, a ranking-by-elimination (RBE) test can assess multiple models with similar benefits as pairwise comparisons for subtle differences across models without the time penalty. We compared the ACR and RBE tests for TTS evaluation in a controlled experiment. We found that the obtained results were statistically similar even in the presence of perceptually close TTS models.

3 Replies

Loading