Federated Learning for Human-in-the-Loop Many-to-Many Voice ConversionDownload PDF

Published: 15 Jun 2023, Last Modified: 27 Jun 2023SSW12Readers: Everyone
Keywords: many-to-many voice conversion, federated learning, human-in-the-loop, distributed machine learning, StarGANv2-VC
Abstract: We propose a method for training a many-to-many voice conversion (VC) model that can additionally learn users' voices while protecting the privacy of their data. Conventional many-to-many VC methods train a VC model using a publicly available or proprietary multi-speaker corpus. However, they do not always achieve high-quality VC for input speech from various users. Our method is based on federated learning, a framework of distributed machine learning where a developer and users cooperatively train a machine learning model while protecting the privacy of user-owned data. We present a proof-of-concept method on the basis of StarGANv2-VC (i.e., Fed-StarGANv2-VC) and demonstrate that our method can achieve speaker similarity comparable to conventional non-federated StarGANv2-VC.
Supplementary Material: zip
7 Replies