Domain-Constrained Distillation of DINOv3 into a Lightweight Foundation Model toward Point-of-Care Ultrasound
Keywords: DINOv3, Distillation, POCUS, Foundation Model, Domain Adaptation.
TL;DR: Modality-aware SSL is crucial for ultrasound. By distilling DINOv3 ViT-B/16 into a ResNet-50 using ultrasound-specific augmentations and 40 curated datasets, we learn representations that outperform baselines, especially with limited labels.
Abstract: Vision foundation models such as DINOv3 provide powerful representations but are too computationally demanding for point-of-care ultrasound (POCUS), whereas lightweight CNNs remain deployable yet brittle when faced with diverse anatomies and acquisition
styles. We bridge this gap with a domain-constrained distillation framework that transfers DINOv3 ViT-B/16 knowledge into a compact ResNet-50, achieving roughly 3.4× compression while preserving the teacher’s billion-scale visual priors. Using a large, heterogeneous
ultrasound corpus and physics-aware augmentations, the distilled model delivers substantial linear-probe improvements over standard CNN baselines and consistently outperforms the ViT teacher on challenging, heterogeneous datasets. It further offers marked gains in
limited-label regimes, reflecting the realities of POCUS workflows where annotated data are scarce. Embedding visualizations show that the distilled encoder forms clearer, anatomy-aware clusters than the teacher, indicating successful alignment to ultrasound structure. Together, these results demonstrate that large-scale natural-image priors can be distilled into a lightweight, generalizable encoder suitable for resource-constrained clinical deployment.
Primary Subject Area: Foundation Models
Secondary Subject Area: Transfer Learning and Domain Adaptation
Registration Requirement: Yes
Visa & Travel: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 328
Loading