Federated Learning for Human-in-the-Loop Many-to-Many Voice Conversion

Ryunosuke Hirai; Yuki Saito; Hiroshi Saruwatari

Federated Learning for Human-in-the-Loop Many-to-Many Voice Conversion

Ryunosuke Hirai, Yuki Saito, Hiroshi Saruwatari

Published: 15 Jun 2023, Last Modified: 27 Jun 2023SSW12Readers: Everyone

Keywords: many-to-many voice conversion, federated learning, human-in-the-loop, distributed machine learning, StarGANv2-VC

Abstract: We propose a method for training a many-to-many voice conversion (VC) model that can additionally learn users' voices while protecting the privacy of their data. Conventional many-to-many VC methods train a VC model using a publicly available or proprietary multi-speaker corpus. However, they do not always achieve high-quality VC for input speech from various users. Our method is based on federated learning, a framework of distributed machine learning where a developer and users cooperatively train a machine learning model while protecting the privacy of user-owned data. We present a proof-of-concept method on the basis of StarGANv2-VC (i.e., Fed-StarGANv2-VC) and demonstrate that our method can achieve speaker similarity comparable to conventional non-federated StarGANv2-VC.

Supplementary Material: zip

7 Replies

Loading