\section{Discussion and Conclusion}

This work shows that our U-Mamba MTL-Single model outperformed baseline state-of-the-art models for PCa detection in bpMRI and achieved zonal segmentation performance comparable to inter-reader variability, as demonstrated on the Prostate158 dataset (\appendixref{appendix:zonal-evaluation}). Notably, it ranked 23rd out of 450 on the PI-CAI leaderboard, underscoring its competitiveness. Its strong AP and AUC scores indicate precise lesion localization and reliable patient-level classification, which are critical for guided prostate biopsy, improving targeting accuracy and reducing unnecessary procedures. Although the 95th confidence intervals are overlapping, both U-Mamba MTL variants achieved the highest scores on our out-of-distribution in-house dataset and the PI-CAI development dataset, demonstrating promising generalizability. These results highlight the importance of integrating zonal anatomy, which enhances PCa detection compared to using U-Mamba alone.

% This work shows that our U-Mamba MTL-Single model outperformed the baseline state-of-the-art models for PCa detection in bpMRI. Furthermore, both our U-Mamba MTL models achieves zonal segmentation performance on par with inter-reader variability, as evidenced by the results on the Prostate158 dataset (\appendixref{appendix:zonal-evaluation}).

% Our U-Mamba MTL-Single model ranked 23rd out of 450 on the PI-CAI leaderboard, highlighting its competitiveness in PCa detection. Its strong AP and AUC scores indicate precise lesion localization and reliable patient-level classification, both essential for clinical decision-making. These advancements are particularly impactful for guided prostate biopsy, where accurate detection can improve targeting, minimize unnecessary procedures, and enhance diagnostic confidence, ultimately leading to better patient care.

% Although the 95th confidence intervals are overlapping, both U-Mamba MTL variants achieved the highest scores on our out of distribution in-house dataset, and the PI-CAI development dataset, indicating promising generalizable performance. From these experiments, it is clear that the integration of zonal anatomy greatly improves the PCa detection performance compared to predicting PCa by itself using U-Mamba. 

Analyzing the experimental results reveals that the single-decoder MTL approach significantly outperforms the dual-decoder MTL method. However, since PCa detection and zonal segmentation are fundamentally different tasks, an optimal balance of shared parameters may be achieved by partially splitting the decoder between the bottleneck and the prediction heads. Identifying this optimal point remains an avenue for future research. 

All U-Mamba variants outperformed the baseline methods in terms of the combined score, except for the base U-Mamba on the PI-CAI hidden development dataset. This superior performance may be attributed to U-Mamba's enhanced ability to capture long-range dependencies, facilitated by the relatively large input size used in this study and the absence of patch-based learning. Additionally, its relatively high parameter count, comparable only to Swin UNETR, may have contributed to its effectiveness. However, further research is needed to confirm these factors' impact on performance.

Swin UNETR has architectural limitations, requiring a minimum input size of 32 for each dimension. Since the average Z-dimension size for the bpMRI data used in this study is 20, padding was necessary, which may have contributed to Swin UNETR's poor performance. While nnDetection demonstrated strong patient-level classification performance (AUC), its ability to localize PCa (AP) was among the lowest. This poor AP score is partly due to the nature of the nnDetection architecture, which only produces bounding boxes, unlike the other architectures in this study that generate segmentation masks. As AP is calculated by defining a lesion candidate as a true positive given a 10\% overlap, the bounding-box-based approach may have been a limiting factor.

The qualitative results highlights challenges in PCa detection. Some false positives were caused by hypointense areas in the ADC channel, often indicative of PCa (rows 2 and 3, \figureref{fig:qualitative}). Conversely, a false negative occurred in a region lacking ADC hypointensity despite a ground truth annotation (row 3, \figureref{fig:qualitative}). Additionally, rows 1 and 3 illustrate improved PCa delineation in U-Mamba MTL-Single and Dual compared to the base U-Mamba, emphasizing the benefits of incorporating zonal anatomy context.

%--------------------------------------------------

% This relatively large input size, given the absence of patch-based learning, likely contributed to the superior performance of the U-Mamba architecture, as it leveraged its enhanced capacity for capturing long-range dependencies.

% Integrating zonal masks into the U-Mamba architecture using multi-task learning (MTL) demonstrated improvements over the unaltered U-Mamba in both our out-of-distribution in-house dataset (N=200) and the PI-CAI hidden development set (N=100), as measured by the aggregated score. Notably, the improvement on the in-house dataset was primarily attributed to the model's enhanced ability to detect tumor regions, as reflected in the AP metric. However, quantitative results on the PI-CAI hidden tuning cohort indicated that the unaltered U-Mamba outperformed its MTL variant in terms of AP, while nnDetection achieved the highest AUC. While a multi-task loss weight balancing factor $\beta$ of 0.2 yielded the best performance among the tested values (0.2 and 0.5), this hyperparameter warrants further investigation in future work. 

% Although our U-Mamba MTL model does outperform all introduced baselines in terms of the aggregated score on the PI-CAI hidden tuning cohort, it is important to note that other submissions to the PI-CAI leaderboard displayed superior performance, leaving our submission at 127th place out of 448 submissions at the time of writing. Testing on the PI-CAI hidden test set (N=1000), which is of higher  quality than the hidden development cohort, is subject to further research.

%with first place achieving 0.813,

% The Swin UNETR transformer-based baseline performed reasonably well on our in-house dataset but required padding the Z dimension to a fixed size of 32 to comply with architectural requirements. This padding, necessitated by the highly anisotropic nature of prostate MRIs (with an average of 20 slices in the Z dimension), increased computational overhead and may have negatively affected the model's performance.



% It is important to note the limited representativeness of the PI-CAI hidden tuning cohort (N=100), as evidenced by discrepancies between rankings on the hidden tuning cohort and the final PI-CAI rankings based on the hidden test set (N=1000). Unfortunately, access to the PI-CAI test set was restricted to the challenge, precluding further analysis of its 1000 cases.





