Keywords: Self-supervised learning, Multi-view learning, Veterinary radiology, Contrastive learning
TL;DR: We introduce VET-DINO, a self-supervised method that learns anatomical understanding from real multi-view radiographs, outperforming purely synthetic augmentation–based approaches.
Abstract: Self-supervised learning has emerged as a powerful paradigm for training deep neural networks, particularly in medical imaging where labeled data is scarce. While current approaches typically rely on synthetic augmentations of single images, we propose VET-DINO, a framework that leverages a unique characteristic of medical imaging: the availability of multiple standardized views from the same study. Using clinical veterinary radiographs from the same patient, we enable models to learn view-invariant anatomical structures and develop an implied 3D understanding from 2D projections. We demonstrate our approach on a dataset of five million radiographs from 668,000 canine studies. Through extensive experiments, including view synthesis and downstream task evaluation, we show that learning from real multi-view pairs leads to superior anatomical understanding compared to synthetic augmentations. VET-DINO achieves state-of-the-art performance on multiple veterinary imaging tasks and establishes a new paradigm for self-supervised learning in medical imaging that exploits domain-specific structure rather than merely adapting natural image techniques.
Primary Subject Area: Unsupervised Learning and Representation Learning
Secondary Subject Area: Foundation Models
Registration Requirement: Yes
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 90
Loading