Federated Learning with Differential Privacy for End-to-End Speech Recognition

Martin Pelikan; Sheikh Shams Azam; Vitaly Feldman; Jan Silovsky; Kunal Talwar; Tatiana Likhomanenko

Federated Learning with Differential Privacy for End-to-End Speech Recognition

Martin Pelikan, Sheikh Shams Azam, Vitaly Feldman, Jan Silovsky, Kunal Talwar, Tatiana Likhomanenko

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: federated learning, differential privacy, speech recognition, transformers

TL;DR: We get practical federated learing models for E2E ASR and establish first baselines with federated learning and differential privacy for ASR.

Abstract: While federated learning (FL) has recently emerged as a promising approach to train machine learning models, it is limited to only preliminary explorations in the domain of automatic speech recognition (ASR). Moreover, FL does not inherently guarantee user privacy and requires the use of differential privacy (DP) for robust privacy guarantees. However, we are not aware of prior work on applying DP to FL for ASR. In this paper, we aim to bridge this research gap by formulating an ASR benchmark for FL with DP and establishing the first baselines. First, we extend the existing research on FL for ASR by exploring different aspects of recent *large end-to-end transformer models*: architecture design, seed models, data heterogeneity, domain shift, and impact of cohort size. With a *practical* number of central aggregations we are able to train **FL models** that are **nearly optimal** even with heterogeneous data, a seed model from another domain, or no pre-trained seed model. Second, we apply DP to FL for ASR, which is non-trivial since DP noise severely affects model training, especially for large transformer models, due to highly imbalanced gradients in the attention block. We counteract the adverse effect of DP noise by reviving per-layer clipping and explaining why its effect is more apparent in our case than in the prior work. Remarkably, we achieve user-level ($7.2$, $10^{-9}$)-**DP** (resp. ($4.5$, $10^{-9}$)-**DP**) with a 1.3\% (resp. 4.6\%) absolute drop in the word error rate for extrapolation to high (resp. low) population scale for **FL with DP in ASR**.

Primary Area: general machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2839

Loading