Pitch-Asynchronous Overlap-Add Waveform-Concatenation Speech Synthesis by Using a Phase-Optimizing Neural Network

Published: 01 Jan 2003, Last Modified: 24 May 2025KES 2003EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The pitch-synchronous overlap-add (PSOLA) speech synthesis method has been conventionally used for a high-quality waveform-concatenation. The basis lies in the periodic structure of voiced speech, i.e., the pitchmark. Though the PSOLA-synthesized sound has a high quality so far as the pitchmark detection is successful, it is sometimes degraded to a great extent when it fails to detect the pitchmark or, more fundamentally, when the sound is unvoiced consonant. In this paper, we propose a pitch-asynchronous waveform-concatenation speech synthesis method. It is based on an adaptive phase optimization by using a complex-valued neural processing to maintain a desirable degree of pulse sharpness. Experimental results demonstrate a successful generation of high-quality sound.
Loading