\appendix

\section{Clinical Information}
\label{appendix:clinical-variables}


\tableref{tab:clinical_info_train,tab:clinical_info_hidden,tab:clinical_info_inhouse} contains the clinical information for the datasets used in this study and their origin.



% Table 1: PI-CAI Training cohort
\begin{table}[ht]
\centering
% \resizebox{\textwidth}{!}{%
\begin{tabular}{lccc}
\hline
 & \multicolumn{3}{c}{PI-CAI Training cohort} \\
 & RUMC & ZGT & PCNN \\
\hline
Sites & 2 & 1 & 8 \\
Patients & 792 & 346 & 338 \\
Median age, years & 65 (60-69) & 67 (62-72) & 68 (63-72) \\
Median prostate-specific antigen, ng/mL & 9 (6-14) & 7 (5-11) & 9 (6-12) \\
Median prostate volume, mL & 63 (45-88) & 49 (36-70) & 50 (35-70) \\
Field strength, Tesla & 1.5, 3 & 3 & 1.5, 3 \\
Cases & 800 & 350 & 350 \\
Clinically significant prostate \\ cancer (Gleason grade group $\geq$ 2) & 236 (30\%) & 80 (23\%) & 109 (31\%) \\
Positive MRI lesions & 614 & 186 & 287 \\
PI-RADS 3 & 149 (24\%) & 32 (17\%) & 65 (23\%) \\
PI-RADS 4 & 226 (37\%) & 71 (38\%) & 141 (49\%) \\
PI-RADS 5 & 239 (39\%) & 83 (45\%) & 81 (28\%) \\
Gleason grade group 1 & 150 (36\%) & 74 (45\%) & 87 (43\%) \\
Gleason grade group 2 & 136 (33\%) & 46 (28\%) & 78 (39\%) \\
Gleason grade group 3 & 64 (16\%) & 21 (13\%) & 24 (12\%) \\
Gleason grade group 4 & 28 (7\%) & 6 (4\%) & 7 (3\%) \\
Gleason grade group 5 & 33 (8\%) & 16 (10\%) & 6 (3\%) \\
\hline
\end{tabular}
\caption{Clinical variables and statistics for the PI-CAI Training cohort. Data presented as n, n (\%), or median (IQR). Abbreviations: PCNN - Prostaat Centrum Noord-Nederland, PI-RADS - Prostate Imaging Reporting and Data System, PSA - prostate-specific antigen, RUMC - Radboud University Medical Center, ZGT - Ziekenhuisgroep Twente.}
\label{tab:clinical_info_train}
\end{table}

% Table 2A: PI-CAI Hidden tuning cohort
\begin{table}[ht]
\centering
\begin{tabular}{lccc}
\hline
 & \multicolumn{3}{c}{PI-CAI Hidden tuning cohort} \\
 & RUMC & ZGT & PCNN \\
\hline
Sites & 2 & 1 & 3 \\
Patients & 40 & 30 & 30 \\
Median age, years & 64 (58-70) & 66 (61-71) & 66 (60-74) \\
Median prostate-specific antigen, ng/mL & 8 (5-11) & 8 (6-11) & 9 (6-14) \\
Median prostate volume, mL & 64 (46-91) & 46 (35-54) & 42 (30-65) \\
Field strength, Tesla & 3 & 3 & 1.5, 3 \\
Cases & 40 & 30 & 30 \\
Clinically significant prostate \\ cancer (Gleason grade group $\geq$ 2) & 16 (40\%) & 12 (40\%) & 13 (43\%) \\
Positive MRI lesions & 21 & 25 & 33 \\
PI-RADS 3 & 4 (19\%) & 3 (12\%) & 7 (21\%) \\
PI-RADS 4 & 10 (48\%) & 7 (28\%) & 17 (52\%) \\
PI-RADS 5 & 7 (33\%) & 15 (60\%) & 9 (27\%) \\
Gleason grade group 1 & 6 (24\%) & 13 (52\%) & 8 (35\%) \\
Gleason grade group 2 & 8 (32\%) & 7 (28\%) & 8 (35\%) \\
Gleason grade group 3 & 5 (20\%) & 1 (4\%) & 4 (17\%) \\
Gleason grade group 4 & 2 (8\%) & 1 (4\%) & 2 (8\%) \\
Gleason grade group 5 & 4 (16\%) & 3 (12\%) & 1 (4\%) \\
\hline
\end{tabular}
\caption{Clinical variables and statistics for the PI-CAI Hidden tuning cohort. Data presented as n, n (\%), or median (IQR). Abbreviations: PCNN - Prostaat Centrum Noord-Nederland, PI-RADS - Prostate Imaging Reporting and Data System, PSA - prostate-specific antigen, RUMC - Radboud University Medical Center, ZGT - Ziekenhuisgroep Twente.}
\label{tab:clinical_info_hidden}
\end{table}

% Table 2B: In-House cohort
\begin{table}[ht]
\centering
\begin{tabular}{lc}
\hline
 & In-House cohort \\
 & STOH \\
\hline
Sites & 1 \\
Patients & 200 \\
Median age, years & 66 (60-69) \\
Median prostate-specific antigen, ng/mL & 7 (5-12) \\
Median prostate volume, mL & 50 (36-71) \\
Field strength, Tesla & 3 \\
Cases & 200 \\
Clinically significant prostate \\ cancer (Gleason grade group $\geq$ 2) & 80 (40\%) \\
Positive MRI lesions & 131 \\
PI-RADS 3 & 29 (23\%) \\
PI-RADS 4 & 34 (25\%) \\
PI-RADS 5 & 68 (52\%) \\
Gleason grade group 1 & 23 (18\%) \\
Gleason grade group 2 & 40 (30\%) \\
Gleason grade group 3 & 39 (30\%) \\
Gleason grade group 4 & 14 (10\%) \\
Gleason grade group 5 & 15 (12\%) \\
\hline
\end{tabular}
\caption{Clinical variables and statistics for the In-House cohort. Data presented as n, n (\%), or median (IQR). Abbreviations: PI-RADS - Prostate Imaging Reporting and Data System, PSA - prostate-specific antigen, STOH - St. Olav's Hospital, Trondheim University Hospital.}
\label{tab:clinical_info_inhouse}
\end{table}

\FloatBarrier
\section{Data Augmentation}
\label{appendix:data-aug}

\tableref{tab:augmentation} contains the augmentations used for training and validation of the models used in this study. All augmentations were implemented using MONAI \cite{consortium_monai_2024}.

\begin{filecontents*}{data_augmentation.csv}
Augmentation;Parameter
Spacing*; $(0.5mm, 0.5mm, 3.0mm)$
Crop or Pad*; $(256, 256, 20\dagger)$
Z-score normalization*; Channel wise
Random flip; Along each axis 
Random Gaussian Smoothing; sigma=(0.5, 1.0)
Random Scale Intensity; 10\%
Random Shift Intensity; 10\%
Random Gaussian Noise; mean=0, std=0.1
Random Affine; rotate=(0.15, 0.15, 0)
\end{filecontents*}


\begin{table}[h!]
    \centering
    \csvautobooktabular[separator=semicolon]{data_augmentation.csv}
    \caption{Dataset augmentations. * denotes that the augmentation is used for all splits, the rest is used only for the training split. $\dagger$ denotes that the Z dimension was changed to 32 for Swin UNETR as this is the lowest size for the architecture.}
    \label{tab:augmentation}
\end{table}

\section{Model Efficiency}
\label{appendix:model-efficiency}

\tableref{tab:efficiency} shows an efficiency analysis of each model used in this study.

\begin{filecontents*}{efficiency.csv}
Model,Params,Training time, Inference time
U-Mamba MTL Single (ours), 73.6M, 15H, 1.1s
U-Mamba MTL Dual (ours), 114M, 16.5H, 0.9s
U-Mamba, 73.6M, 12.5H, 1.0s
Swin UNETR, 72.8M , 34H, 1.3s
U-Net, 31.8M, N/A, 1.3s
nnUNet, 44.8M, N/A, 31s
nnDet, 24.7M, N/A, 105s


\end{filecontents*}

\begin{table}[h]
    \centering
    \csvautobooktabular[separator=comma]{efficiency.csv}
    \caption{Training time refers to each fold trained for 200 epochs on a A100 80GB VRAM GPU. The training time is not available for the PI-CAI baselines, as these models were trained by the PI-CAI organizers. Inference time includes the full inference pipeline (5 fold predictions, test time augmentations etc).}
    \label{tab:efficiency}
\end{table}

\section{Zonal Segmentation}
\label{appendix:zonal-evaluation}

Although the zonal segmentation task for our U-Mamba MTL models is deemed as an auxiliary task, these masks might be useful for downstream tasks given sufficient quality. In order to assess the accuracy of the zonal masks, inference was performed on the 200 patients from the in-house dataset and the 158 patients from the Prostate158 dataset.  The predicted segmentation masks are then compared to the ground truth using the Dice Score (DSC) metric (\figureref{fig:zonal-dsc}). Please note that the DSC is compared to reader 1 in the P158 dataset.

\begin{figure}[htb]
\floatconts
{fig:zonal-dsc}% label for whole figure
{\caption{Dice score distributions for prostate zones for the U-Mamba MTL model}}% caption for whole figure
{%
\subfigure[In-House Dataset U-Mamba MTL-Single][centred]{%
\label{fig:pic3}% label for this sub-figure
\includegraphics[width=\textwidth / 3]{figures/single_dlr.png}
}\qquad % space between images
\subfigure[P158 Dataset U-Mamba MTL-Single][centred]{%
\label{fig:pic4}% label for this sub-figure
\includegraphics[width=\textwidth / 3]{figures/single_158.png}
}\\ % new row
\subfigure[In-House Dataset][centred]{%
\label{fig:pic1}% label for this sub-figure
\includegraphics[width=\textwidth / 3]{figures/anatomy_dsc_dlr.png}
}\qquad % space between images
\subfigure[Prostate 158 Dataset][centred]{%
\label{fig:pic2}% label for this sub-figure
\includegraphics[width=\textwidth / 3]{figures/dual_p158.png}
}
}
\end{figure}

The auxiliary task of zonal segmentation within the both U-Mamba MTL architectures yielded strong results on both our in-house dataset and the Prostate158 dataset. Specifically, both our models achieved DSC scores of 0.76 and 0.87 for the peripheral zone (PZ) and transition zone (TZ), respectively, aligning closely with reported inter-reader variability ($DSC_{PZ}=0.75$, $DSC_{TZ}=0.87$). Notably, except for the ProstateX subset (N=346) of the PI-CAI Training set (
N=1500), all zonal masks were AI-generated, indicating that the model's zonal segmentation performance is largely a product of weak supervision.

