DBN-based Spectral Feature Representation for Statistical Parametric Speech Synthesis

Ya-Jun Hu, Zhen-Hua Ling

Published: 2016, Last Modified: 30 Sept 2024IEEE Signal Process. Lett. 2016EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: This letter presents a method of deriving spectral features using a deep belief network (DBN) for hidden Markov model (HMM)-based parametric speech synthesis. At training time, a DBN is estimated to represent the high-dimensional spectral envelopes and then transforms them into binary codes. These DBN-based binary codes (DBCs) are used as spectral features for HMM modeling. At synthesis time, spectral envelopes are recovered from the predicted DBC sequences and then used for waveform reconstruction. Experimental results show that our proposed method can achieve better naturalness than the conventional method using mel-cepstra as spectral features and considering global variance (GV) during parameter generation.