Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections
Keywords: Endoscopic Video Analysis, Federated Learning, Foundation Models, Surgical Data Science, Vision Transformers
TL;DR: FL-EndoViT is a framework that utilizes federated learning with adaptive optimization to train robust, privacy-preserving surgical foundation models that achieve performance comparable to centralized baselines without sharing patient data.
Abstract: Purpose: Data privacy regulations hinder the creation of generalizable foundation models (FMs) for surgery by preventing multi-institutional data aggregation. This study investigates federated learning (FL) as a privacy-preserving solution to collaboratively train robust surgical FMs.
Methods: We introduce Federated EndoViT (FL-EndoViT), which adapts the Masked Autoencoder (MAE) pretraining strategy for FL, enhanced with adaptive Sharpness-Aware Minimization (FedSAM) to manage surgical data heterogeneity. Pretrained on the large-scale Endo700k dataset, FL-EndoViT is evaluated against a centralized baseline on different tasks including scene segmentation, action recognition, and phase recognition.
Results: FedSAM is critical for successful pretraining, overcoming the convergence failures of standard federated methods. The resulting FL-EndoViT performs comparably to its centralized counterpart, with significant advantages in data-scarce, high-resolution segmentation and generalization to new surgical events. We also establish that full, end-to-end fine-tuning is necessary for optimal performance.
Conclusion: This work establishes FL with adaptive optimization as a viable paradigm for creating robust, privacy-preserving surgical FMs. Our findings provide a scalable framework for collaborative Surgical Data Science and underscore the optimizer's critical role in handling data heterogeneity. Future work should explore video-based models to incorporate spatiotemporal dynamics.
Primary Subject Area: Federated Learning
Secondary Subject Area: Foundation Models
Registration Requirement: Yes
Reproducibility: https://github.com/KirchnerMax/FL-EndoViT
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Midl Latex Submission Checklist: Ensure no LaTeX errors during compilation., Replace NNN with your OpenReview submission ID., Includes \documentclass{midl}, \jmlryear{2026}, \jmlrworkshop, \jmlrvolume, \editors, and correct \bibliography command., Did not override options of the hyperref package., Did not use the times package., Use the correct spelling and format, avoid Unicode characters, and use LaTeX equivalents instead., Any math in the title and abstract must be enclosed within $...$., Did not override the bibliography style defined in midl.cls and did not use \begin{thebibliography} directly to insert references., Avoid using \scalebox; use \resizebox when needed., Included all necessary figures and removed *unused* files in the zip archive., Removed special formatting, visual annotations, and highlights used during rebuttal., All special characters in the paper and .bib file use LaTeX commands (e.g., \'e for é)., No separate supplementary PDF uploads., Acknowledgements, references, and appendix must start after the main content.
Latex Code: zip
Copyright Form: pdf
Submission Number: 148
Loading