HePCo: Data-Free Heterogeneous Prompt Consolidation for Continual Federated Learning

Shaunak Halbe; James Seale Smith; Junjiao Tian; Zsolt Kira

HePCo: Data-Free Heterogeneous Prompt Consolidation for Continual Federated Learning

Shaunak Halbe, James Seale Smith, Junjiao Tian, Zsolt Kira

Published: 28 Oct 2023, Last Modified: 15 Dec 2023FL@FM-NeurIPS’23 OralEveryoneRevisionsBibTeX

Student Author Indication: Yes

Keywords: federated learning, continual learning, prompt tuning, foundation models, knowledge distillation, heterogeneity

TL;DR: We propose a prompt tuning and aggregation scheme leveraging foundation models and a lightweight data-free distillation mechanism to tackle forgetting and heterogeneity in continual federated learning

Abstract: In this paper, we focus on the important yet understudied problem of Continual Federated Learning (CFL), where a server communicates with a set of clients to incrementally learn new concepts over time without sharing or storing any data. The complexity of this problem is compounded by challenges from both the Continual and Federated Learning perspectives. Specifically, models trained in a CFL setup suffer from catastrophic forgetting which is exacerbated by data heterogeneity across clients. Existing attempts at this problem tend to impose large overheads on clients and communication channels or require access to stored data which renders them unsuitable for real-world use due to privacy. We study this problem in the context of Foundation Models and showcase their effectiveness in mitigating forgetting while minimizing overhead costs and without requiring access to any stored data. We achieve this by leveraging a prompting based approach (such that only prompts and classifier heads have to be communicated) and proposing a novel and lightweight generation and distillation scheme to aggregate client models at the server. We formulate this problem for image classification and establish strong baselines for comparison, conduct experiments on CIFAR-100 as well as challenging, large-scale datasets like ImageNet-R and DomainNet. Our approach outperforms both existing methods and our own baselines by more than 7\% while significantly reducing communication and client-level computation costs.

Submission Number: 13

Loading