Keywords: Breast Cancer, Deep Learning, Risk Prediction, DINOv3, Vision Transformers, Hybrid Networks
TL;DR: A hybrid CNN–ViT model with explicit contralateral asymmetry modelling can predict 3-year breast cancer risk from low-resolution mammograms as accurately as high-resolution models like Mirai, while using ~13x less input resolution.
Abstract: Breast cancer screening programmes increasingly seek to move from one-size-fits-all interval to risk-adapted and personalized strategies. The advent of deep learning (DL) gave birth to a wave of image-based risk models, able to provide more accurate short- to medium-term risk (1-5 years), compared with traditional risk models. Existing image-based risk models, such as Mirai, achieve strong discrimination but typically rely on convolutional backbones, ultra-high-resolution inputs and relatively simple multi-view fusion, with limited explicit modelling of contralateral asymmetry.
We hypothesised that combining complementary inductive biases (convolutional and transformer-based) with explicit contralateral asymmetry modelling would allow us to match state-of-the-art 3-year risk prediction performance even when operating on substantially lower-resolution mammograms, indicating that using less detailed images in a more structured way can recover state-of-the-art accuracy.
In this work, we present MamaDino, a hybrid network that fuses frozen self-supervised DINOv3 (ViT-S) features with a trainable CNN encoder at 512×512 resolution and aggregates left-right breast information via a BilateralMixer to predict a 3-year breast cancer risk score. We train on 53,883 women from OPTIMAM, a UK cohort, and evaluate on matched 3-year case-control cohorts: an in-distribution test set from four UK screening sites and an external out-of-distribution test set from an unseen site.
At breast level granularity MamaDino matched Mirai 3-year risk prediction both on the internal and external test sets while using $\sim13\times$ fewer input pixels.
Adding the BilateralMixer, MamaDino achieved an AUC of $0.736$ (vs Mirai's $0.713$) on the in-distribution test set and 0.677 (vs 0.666) on the external test set, showing consistent quality results across age, ethnicity, scanner, tumour type, and grade.
These findings demonstrate that explicit contralateral modelling and complementary inductive biases enable predictions that match Mirai, despite operating on substantially lower-resolution mammograms
Primary Subject Area: Detection and Diagnosis
Secondary Subject Area: Application: Radiology
Registration Requirement: Yes
Visa & Travel: No
Read CFP & Author Instructions: Yes
Originality Policy: Yes
Single-blind & Not Under Review Elsewhere: Yes
LLM Policy: Yes
Submission Number: 107
Loading