Privacy-Preserving Federated Vision Transformer Learning Leveraging Lightweight Homomorphic Encryption in Medical AI

Al Amin, Kamrul Hasan, Liang Hong, Sharif Ullah

Published: 2026, Last Modified: 25 May 2026ICNC 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Collaborative machine learning across healthcare institutions promises improved diagnostic accuracy by leveraging diverse datasets, yet privacy regulations such as HIPAA prohibit direct patient data sharing. While federated learning (FL) enables decentralized training without raw data exchange, recent studies demonstrate that model gradients in conventional FL remain vulnerable to reconstruction attacks, potentially exposing sensitive medical information. This paper presents a privacy-preserving federated learning framework combining Vision Transformers (ViT) with homomorphic encryption (HE) for secure multiinstitutional histopathology classification. This approach leverages the ViT’s [CLS] token as a compact 768-dimensional feature representation as the unit of secure aggregation, encrypting these tokens using CKKS homomorphic encryption before server transmission. We demonstrate that encrypting [CLS] tokens achieves a 30-fold communication reduction compared to gradient encryption while maintaining strong privacy guarantees. Through evaluation on a three-client federated setup for lung cancer histopathology classification, we show that Gradients are highly susceptible to model inversion attacks (PSNR: 52.26 dB, SSIM: 0.999, NMI: 0.741), enabling near-perfect image reconstruction. In contrast, the proposed CLS-protected HE approach prevents such attacks while enabling encrypted inference directly on ciphertexts, requiring only 326 KB encrypted data transmission per aggregation round. The framework achieves $\mathbf{9 6. 1 2 \%}$ global classification accuracy in the unencrypted domain and 90.02% in the encrypted domain.

External IDs:dblp:conf/iccnc/AminHHU26