Abstract: Visual Prompt Tuning (VPT) of pre-trained Vision Transformers (ViTs) has proven highly effective as a parameter-efficient fine-tuning technique for adapting large models to downstream tasks with limited data. Its parameter efficiency makes it particularly suitable for Federated Learning (FL), where both communication and computation budgets are often constrained. However, global prompt tuning struggles to generalize across heterogeneous clients, while personalized tuning overfits to local data and lacks generalization. We propose PEP-FedPT (Prompt Estimation from Prototypes for Federated Prompt Tuning), a unified framework designed to achieve both generalization and personalization in federated prompt tuning of ViTs. Within this framework, we introduce the novel Class-Contextualized Mixed Prompt (CCMP) — based on class-specific prompts maintained alongside a globally shared prompt. For each input, CCMP adaptively combines class-specific prompts using weights derived from global class prototypes and client class priors. This approach enables per-sample prompt personalization without storing client-dependent trainable parameters. The prompts are collaboratively optimized via traditional federated averaging technique on the same. Comprehensive evaluations on CIFAR-100, TinyImageNet, DomainNet, and iNaturalist datasets demonstrate that PEP-FedPT consistently surpasses the state-of-the-art baselines under diverse data heterogeneity scenarios, establishing a strong foundation for efficient and generalizable federated prompt tuning of Vision Transformers.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: **As suggested by the reviewer AyXx**
Refined the method Section 4.1 explanation (pages 5, 6) (also includes correcting minor typographical errors )
Clarification on the assumptions, especially A.3 (page 8)
Refined the statement of the proposition to accommodate the impact of assumption (A.3) (page 8)
We clearly state the limitation and impact of our assumption A.3 on the analysis in the limitation section (page 12)
The detailed explanation of the modelling of p_k(cls_{l-1},p) is provided in the Aec.A.6.2 of the appendix (page 29)
**Changes suggested by reviewer rvUd**
We have added the discussion the privacy on page 7
Related work based on prompt tuning for DA has been added on page 3
Expanded the discussion on the limitations (page 8)
The complete hyper-parameter settings have been added in Sec A.3.2 (page 19)
Expanded the discussion on the DP in Sec A.4.1 page (20)
**Changes suggested by reviewer 72NJ**
Inaturalist metric clarification is provided in page (10) additional experiments in sec A.4.8 (pages 22-23)
Clarification of pFedpG baseline low accuracy is in page (10-11)
Improved explanation of mechanism overview in introduction on page (2)
Expanded limitations on page 12
Robustness to varying Dirichlet concentration in Sec A.5.5 (pages 25-26)
Sensitivity to temperature is in Section A.5.6 (pages 25-26)
Impact of temperature on prompt insertion in Sec A.5.7 (26)
Personalization and Generalization Trade-off Sec A.5.8 (26-27)
Assigned Action Editor: ~Konstantin_Mishchenko1
Submission Number: 6062
Loading