Better Replacement for TTS Naturalness EvaluationDownload PDF

Published: 15 Jun 2023, Last Modified: 29 Jun 2023SSW12Readers: Everyone
Keywords: Text-To-Speech, Naturalness, Evaluation
TL;DR: Naturalness is not properly definable, so we propose better replacements for it
Abstract: Text-To-Speech (TTS) systems are commonly evaluated in two main dimensions: intelligibility and naturalness. While there are clear proxies for intelligibility measurements such as transcription Word-Error-Rate (WER), naturalness is not nearly so well defined. In this paper, we present the results of our attempt to learn what aspects human listeners consider when they are asked to evaluate the “naturalness” of TTS systems. We conducted a user study similar to common TTS evaluations and at the end asked the subject to define the sense of naturalness that they had used. Then we coded their answers and conducted statistical analysis between codes to create a list of aspects that users consider as part of naturalness. We can now provide a list of suggested replacement questions to use instead of a single oblique notion of naturalness.
3 Replies

Loading