A Closer Look at Personalized Fine-Tuning in Heterogeneous Federated Learning

TMLR Paper6526 Authors

16 Nov 2025 (modified: 13 Jan 2026)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Federated Learning (FL) enables decentralized, privacy-preserving model training but struggles to balance global generalization and local personalization due to non-identical data distributions across clients. Personalized Fine-Tuning (PFT), a popular post-hoc solution, fine-tunes the final global model locally but often overfits to skewed client distributions or fails under domain shifts. We propose adapting Linear Probing followed by full Fine-Tuning (LP-FT)—a principled centralized strategy for alleviating feature distortion—to the FL setting. Through systematic evaluation across seven datasets and six PFT variants, we demonstrate LP-FT’s superiority in balancing personalization and generalization. Our analysis uncovers federated feature distortion, a phenomenon where local fine-tuning destabilizes globally learned features, and theoretically characterizes how LP-FT mitigates this via phased parameter updates. We further establish conditions (e.g., partial feature overlap, covariate-concept shift) under which LP-FT outperforms standard fine-tuning, offering actionable guidelines for deploying robust personalization in FL.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Following the rebuttal and the reviewers’ feedback, we have incorporated these revisions into the manuscript: **1. Clarification of Scope: PFT vs. Process-Integrated PFL.** We added clarifying explanations in the Introduction distinguishing *process-integrated personalized federated learning (PFL)* methods from *post-hoc personalized fine-tuning (PFT)*, which is the primary focus of this paper. We explicitly state that our goal is not to propose a new process-integrated PFL algorithm, but rather to rigorously characterize **plug-and-play, post-hoc personalization applied after a fixed global training stage**, and to establish LP-FT as a strong and practical baseline in this setting. **2. Additional Comparisons with Process-Integrated PFL Methods.** To further contextualize the effectiveness of post-hoc PFT, we added new experiments comparing **FedAvg + LP-FT** against representative process-integrated PFL methods on the **CelebA** dataset (see **Table 8**), including FedBN, PerAvg, FedNova, FedRep, FedSoup, pFedFDA, and FedL2G. Despite operating purely as a post-hoc layer on top of FedAvg, LP-FT achieves the best performance across all three metrics, outperforming even the strongest process-integrated PFL methods in both local and global accuracy. These results reinforce LP-FT as a **plug-and-play baseline** that preserves global generalization while delivering effective personalization. In addition, we expanded **Appendix A.1 (Related Work)** to include a broader discussion comparing PFT and PFL methods, as suggested by Reviewer 6AJx. **3. Expanded Analysis of Parameter-Efficient Fine-Tuning (PEFT).** We revised and completed **Section D.1, *Exploration of Parameter-Efficient Fine-Tuning under Federated Feature Distortion***, which investigates whether PEFT methods can implicitly mitigate the distortion phenomenon identified in this work. Based on experiments with **LoRA** and **Adapter** on **DomainNet with ViT**, Table 7 shows that while PEFT methods achieve strong local accuracy, they suffer substantial drops in global accuracy, indicating significant distortion of shared federated features. **Note:** To improve cohesion, we added a complementary discussion in the Empirical Results section of the **main manuscript** (last paragraph of Sec. 3.4) summarizing the PEFT and PFL comparisons and clarifying their role relative to the main focus of the paper. **4. Clarification of Theoretical Assumptions and Positioning.** We expanded the theoretical discussion to better clarify the scope and intent of our analysis. In particular, we added explicit discussion—motivated by the rebuttal—on the use of **simplified two-layer models** to obtain tractable insights under heterogeneity (see the paragraph following **Assumption 4.2**). We emphasize that our objective is not to provide architecture-agnostic guarantees, but rather to identify the **core mechanisms driving federated feature distortion** within this setting. We further clarified the role of the **isotropy assumption** for analyzing concept shift (see the paragraphs preceding **Lemma 4.3**) and the rationale behind the shared feature extractor initialization. **5. Local Versions of Theorems 4.4 and 4.5.** We added local-performance counterparts of **Theorems 4.4 and 4.5**, showing that analogous results hold at the client level. These results are formalized in **Remark 4.7** and **Corollaries E.1 and E.2**. **6. Dedicated Discussion of Label Shift.** We added a dedicated discussion clarifying the role of **label shift** in the context of federated feature distortion. We explain why label shift does not directly instantiate the feature-distortion mechanism studied in our theory and therefore is not analyzed theoretically. Nevertheless, label-shift scenarios are included in our empirical evaluation and are further discussed in the **Limitations** section. **7. Precise Definitions of Evaluation Metrics.** To improve clarity of the experimental setup, we added precise definitions of local and global accuracy in Section 3.3 (page 5), complementing the prior verbal explanation. **8. Proof Clarifications.** We added several clarifying sentences to the final part of the proof of **Theorem 4.5**, as requested by Reviewer WNJW, to improve readability and understanding. Finally, we thank the reviewers again for their thorough feedback, which has directly strengthened the paper. We would be happy to address any remaining questions or concerns.
Assigned Action Editor: ~Xiaofeng_Cao1
Submission Number: 6526
Loading