A Vision Foundation Model for Cataract Surgery Using Joint-Embedding Predictive Architecture

Published: 27 Mar 2025, Last Modified: 03 Jun 2025MIDL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Surgical Pretraining, Joint Embedding Predictive Network, Cataract Surgery
Abstract: Vision foundation models can automate analysis of surgical videos and enable multiple applications that support patient care and surgical training. For cataract surgery, existing models are limited by reliance on small datasets, privacy concerns, and poor generalizability across surgical settings. In this paper, we introduce JHU-VPT(JEPA), a self-supervised vision foundation model leveraging Joint-Embedding Predictive Architecture (JEPA) to learn spatiotemporal representations via latent feature prediction on a large corpus of unlabeled cataract videos, without requiring extensive labeled datasets or pixel-level reconstruction. JHU-VPT(JEPA) is pretrained on 2591 videos from multiple sites that capture different surgical technique and style variations. Comprehensive evaluations on step recognition, surgical feedback, and skill assessment tasks demonstrate that JHU-VPT(JEPA) outperforms existing methods. JHU-VPT(JEPA)’s effectiveness is evident even when using attentive probing with a frozen encoder, highlighting the robustness of the learned features and addressing privacy concerns by not requiring access to raw videos during downstream tasks. Our approach offers a scalable, generalizable, and privacy-preserving solution for surgical video analysis, with significant potential to advance patient care and surgical education.
Primary Subject Area: Foundation Models
Secondary Subject Area: Unsupervised Learning and Representation Learning
Paper Type: Validation or Application
Registration Requirement: Yes
Visa & Travel: Yes
Midl Latex Submission Checklist: Ensure no LaTeX errors during compilation., Created a single midl25_NNN.zip file with midl25_NNN.tex, midl25_NNN.bib, all necessary figures and files., Includes \documentclass{midl}, \jmlryear{2025}, \jmlrworkshop, \jmlrvolume, \editors, and correct \bibliography command., Did not override options of the hyperref package, Did not use the times package., All authors and co-authors are correctly listed with proper spelling and avoid Unicode characters., Author and institution details are de-anonymized where needed. All author names, affiliations, and paper title are correctly spelled and capitalized in the biography section., References must use the .bib file. Did not override the bibliographystyle defined in midl.cls. Did not use \begin{thebibliography} directly to insert references., Tables and figures do not overflow margins; avoid using \scalebox; used \resizebox when needed., Included all necessary figures and removed *unused* files in the zip archive., Removed special formatting, visual annotations, and highlights used during rebuttal., All special characters in the paper and .bib file use LaTeX commands (e.g., \'e for é)., Appendices and supplementary material are included in the same PDF after references., Main paper does not exceed 9 pages; acknowledgements, references, and appendix start on page 10 or later.
Latex Code: zip
Copyright Form: pdf
Submission Number: 129
Loading