\section{Introduction}

Prostate cancer is the second most common cancer in men, fourth overall, and the eighth leading cause of cancer-related deaths worldwide \cite{bray2024global}. While histologically confirmed biopsies remain the gold standard for diagnosis and grading tumor aggressiveness, they are invasive, prone to overdiagnosis of clinically insignificant cancers, underdiagnosis of clinically significant cancers, and carry risks of infection and sepsis \cite{loeb2013systematic,borghesi2017complications}. Prostate-specific antigen (PSA) testing and digital rectal examination (DRE) are commonly used for screening, but both have limited specificity and can lead to unnecessary follow-up procedures \cite{grossman2018screening,jones2018diagnostic}. To improve the detection of clinically significant prostate cancer (csPCa), multiparametric magnetic resonance imaging (mpMRI), which includes T$_2$-weighted (T$_2$w) imaging, diffusion-weighted imaging (DWI) with high b-value (HBV) and apparent diffusion coefficient (ADC) maps, and dynamic contrast-enhanced (DCE) imaging has become the standard pre-biopsy method \cite{EAU2019, dasgupta2019nice}.  To address concerns about contrast agent use and scan time, bi-parametric MRI (bpMRI), which excludes DCE, has gained clinical traction while maintaining comparable diagnostic performance to mpMRI \cite{tamada2021comparison, twilt2025evaluating}. Its interpretation is guided by the prostate imaging-reporting and data system version 2.1 (PI-RADS v2.1) \cite{park2021performance}. However, diagnostic accuracy still depends on radiologist expertise and suffers from inter-reader variability \cite{wei2021diagnostic}.

AI-driven computer-aided diagnosis (CAD) systems reduce inter-reader variability in prostate MRI through automated solutions. To support the development, fair evaluation, and comparison of such algorithms, grand challenges\footnote{\url{https://grand-challenge.org/}} provide unbiased and standardized benchmarking using hidden test sets~\cite{van2021artificial}. Earlier efforts for PCa detection, like ProstateX~\cite{armato2018prostatex}, were limited by small, single-center datasets. To overcome these limitations, the multi-center Prostate Imaging–Cancer Artificial Intelligence (PI-CAI) challenge~\cite{saha2024artificial} included 10,207 prostate MRI scans from 9,129 patients across four centers, showing that deep learning ensembles outperformed radiologists. These results highlight the potential of scalable deep learning approaches in prostate MRI analysis. However, assembling such large, expertly annotated datasets is resource-intensive, and most clinical centers have far more unlabeled prostate MRI available, highlighting the need for scalable learning strategies.


Self-supervised learning (SSL) has gained traction in large data domains, enabling pretraining using a pretext task to learn robust feature representations from data. Pretext tasks are self-supervised objectives, such as masked restoration \cite{he2022masked} or contrastive learning \cite{chen2020simple} that allow models to learn useful features without labels. While SSL has advanced fields like NLP \cite{achiam2023gpt} and natural imaging \cite{simeoni2025dinov3}, its uptake in 3D medical imaging remains limited. The field still relies heavily on training from scratch or costly supervised pretraining \cite{isensee2021nnu}, indicating a need for unlabeled pretraining. However, SSL adoption has lagged because prior 3D medical SSL studies often rely on small training datasets, suboptimal or outdated backbone choices compared to strong CNNs like nnU-Net~\cite{isensee2021nnu}, and evaluations that lack robust baselines and diverse testing, ultimately hindering generalizability. To address these gaps in 3D medical imaging, \cite{wald2025revisiting} proposed a comprehensive framework that overcomes the afore-mentioned limitations, demonstrating that masked autoencoders (MAE) combined with a strong backbone architecture outperform other self-supervised pretext tasks \cite{zhou2021models, wang2023mis, he2022masked, wu2024voco, tang2022self,chen2023masked}. Given the domain-specific nature of prostate MRI, different anatomy, imaging characteristics, and lesion distribution, it remains unclear whether these SSL findings translate to csPCa detection. This motivated us to investigate multiple pretext tasks in combination with strong architectural backbones and to benchmark them externally on the PI-CAI grand challenge. 

SSL has been explored to some extent in the prostate cancer (PCa) domain.
Early work by \cite{bolous2021clinically} employed a restoration-based patch-swapping task, but performance was limited by the small training dataset. \cite{li2025cross} introduced a transformer-based contrastive learning framework inspired by SimCLR \cite{chen2020simple} with a multi-task objective; however, the method lacks comprehensive benchmarking across diverse datasets.
Large-scale SSL efforts, such as \cite{de2025self}, demonstrated improvements in PCa classification, but these approaches do not address lesion localization, which is crucial for MRI-guided biopsies. A transformer-based prostate-specific foundation model proposed by \cite{lee2025prostate} incorporated label-assisted pre-training and evaluated both internal and external cohorts, yet it still relies on supervised annotations. ~\cite{yuan2025z} proposed a restoration-based pretext task inspired by Zhou et al.~\cite{zhou2021models}, showing promising performance on the PI-CAI benchmark~\cite{saha2024artificial}. 
In line with the need for stronger backbone architectures in the PCa detection domain, we proposed a UMamba–based multi-task learning model (UMamba-MTL) \cite{ma2024u, larsen2025prostate} and benchmarked it against conventional CNN backbones such as nnU-Net~\cite{isensee2021nnu}, as well as hybrid CNN-Transformer approaches like Swin-UNETR~\cite{hatamizadeh2021swin}.

Motivated by the need for unlabeled pretraining, we hypothesize that pairing large-scale unlabeled bpMRI with the top-performing SSL pretext tasks, identified by \cite{wald2025revisiting}, can enable substantially improved performance for PCa detection. To test this hypothesis, we pretrain the UMamba backbone \cite{larsen2025prostate} using these SSL objectives on a large in-house unlabeled dataset, and evaluate the resulting models on the PI-CAI benchmark alongside the out-of-distribution (OOD) Prostate158 dataset \cite{adams2022prostate158}.

Our main contributions in support of this hypothesis are as follows:

\begin{itemize}
    \item We leverage a large-scale unlabeled bpMRI dataset \textbf{(N=2,431)} for SSL pretraining, combined with publicly available datasets, and evaluate it on a large hidden cohort, making this, to our knowledge, the first large-scale SSL (\textbf{UMamba}-based \textbf{P}rostate \textbf{S}elf-\textbf{S}upervised \textbf{L}earning; \textbf{UMamba-ProSSL}) framework for csPCa lesion detection.
    
    \item We systematically explore multiple state-of-the-art pretext tasks under identical training conditions to identify the most effective SSL strategy and find that \textbf{MAE} yields the most effective transfer-learning gains.
    
    \item Our proposed architecture (UMamba-ProSSL) integrates self-supervised MAE pretraining with multi-task fine-tuning and achieves \textbf{first place} in the international PI-CAI grand challenge benchmark and demonstrates robust performance on the external out-of-distribution P158 dataset.

\end{itemize}

