A dynamic Gaussian process for voice conversion

Dong-Yan Huang, Minghui Dong, Haizhou Li

2013 (modified: 07 Apr 2022)ICME Workshops 2013Readers: Everyone

Abstract: In this paper, we explore Dynamic Gaussian Processes (DGP) based learning techniques for voice conversion. In particular, we propose to use dynamic squared exponential GP with sparse partial least squares (SPLS) technique to model nonlinearities as well as to capture the dynamics in the source data. The concatenation of previous and next frames can well model dynamics. Sparse partial least squares regression is used to find a mapping function in order to overcome the problem of overfitting. The proposed dynamic GP-based learning technique features simple, efficient and high accuracy without massive tuning. The experimental results show that the proposed approach for voice conversion is able to produce good similarity between the original and the converted target voices and achieves a great improvement in the sound quality compared to the state-of-the-art Gaussian mixture-based model.

0 Replies