Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections
Keywords: Endoscopic Video Analysis, Federated Learning, Foundation Models, Surgical Data Science, Vision Transformers
TL;DR: FL-EndoViT is a framework that utilizes federated learning with adaptive optimization to train robust, privacy-preserving surgical foundation models that achieve performance comparable to centralized baselines without sharing patient data.
Abstract: Purpose: Data privacy regulations hinder the creation of generalizable foundation models (FMs) for surgery by preventing multi-institutional data aggregation. This study investigates federated learning (FL) as a privacy-preserving solution to collaboratively train robust surgical FMs.
Methods: We introduce Federated EndoViT (FL-EndoViT), which adapts the Masked Autoencoder (MAE) pretraining strategy for FL, enhanced with adaptive Sharpness-Aware Minimization (FedSAM) to manage surgical data heterogeneity. Pretrained on the large-scale Endo700k dataset, FL-EndoViT is evaluated against a centralized baseline on different tasks including scene segmentation, action recognition, and phase recognition.
Results: FedSAM is critical for successful pretraining, overcoming the convergence failures of standard federated methods. The resulting FL-EndoViT performs comparably to its centralized counterpart, with significant advantages in data-scarce, high-resolution segmentation and generalization to new surgical events. We also establish that full, end-to-end fine-tuning is necessary for optimal performance.
Conclusion: This work establishes FL with adaptive optimization as a viable paradigm for creating robust, privacy-preserving surgical FMs. Our findings provide a scalable framework for collaborative Surgical Data Science and underscore the optimizer's critical role in handling data heterogeneity. Future work should explore video-based models to incorporate spatiotemporal dynamics.
Primary Subject Area: Federated Learning
Secondary Subject Area: Foundation Models
Registration Requirement: Yes
Reproducibility: https://github.com/KirchnerMax/FL-EndoViT
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 148
Loading