Flashback: Understanding and Mitigating Forgetting in Federated Learning

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: pdf
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Federated Learning, Forgetting, Knowledge Distillation, Deep Learning, Continual Learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We tackle the forgetting challenge in Federated Learning due to data heterogeneity, introducing a novel metric for its measurement and proposing Flashback, an FL algorithm that uses dynamic distillation, that accelerates convergence.
Abstract: In the realm of Federated Learning (FL), the convergence and effectiveness of learning algorithms can be severely hampered by the phenomenon of forgetting—where knowledge obtained in one round becomes diluted or lost in subsequent rounds. Such a challenge is a result of severe data heterogeneity across clients. Although FL algorithms like FedAvg have been pivotal, they often falter in scenarios of high data heterogeneity. This work delves into the nuances of this problem, establishing the critical role forgetting plays in the inefficient learning of FL in the context of severe data heterogeneity. Knowledge loss occurs in both the local update and the aggregation step; addressing one phase without considering the other will not mitigate forgetting. We introduce a novel metric that offers a granular measurement of forgetting at every round while ensuring that the occurrence of forgetting is distinctly recognized and not obscured by the simultaneous acquisition of new class-specific knowledge. Leveraging these insights, we propose Flashback, an FL algorithm that integrates a novel dynamic distillation approach. The knowledge of different models is estimated and the distillation loss is adapted accordingly. This adaptive distillation is applied both at the local and global update phases, ensuring models retain essential knowledge across rounds while also assimilating new knowledge. Our approach seeks to robustly mitigate the detrimental effects of forgetting, paving the way for more efficient and consistent FL algorithms, especially in environments of high data heterogeneity. By effectively mitigating forgetting, Flashback achieves faster convergence to target accuracy outperforming baselines, by being up to 88.5$\times$ faster and at least 4.6$\times$ faster across the different benchmarks.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8786
Loading