\section{Experiments}
We focus on three types of experiments: (i) an exhaustive comparative evaluation on noise robustness; (ii) ablation studies on the transpose and interpolation layers; and (iii), ablation studies on the Newton approximation-scaled weight initialization.

\textbf{Datasets.} We evaluate on seven diverse medical imaging datasets spanning different modalities and anatomical regions such as skin lesion (ISIC16~\cite{codella2018skin}, ISIC18~\cite{tschandl2018ham10000}), breast ultrasound (BUSI~\cite{al2020dataset}), polyp (KVASIR~\cite{jha2019kvasir}, SANET~\cite{wei2021shallow}), and dental caries lesion (ACTA~\cite{Gonzalez2025rt}, DCBR~\cite{tichy2023dental}) datasets. All details are provided in Appendix~\ref{ap:exp_data}.

\textbf{Noise simulation.} We inject four types of noise commonly encountered in medical imaging such as Gaussian, Speckle, Poisson, and Rician noise. We also test the robustness to brightness shifts and contrast variations. The complete range of values is provided in Appendix~\ref{ap:exp_noise}. Models are trained \textit{only on clean data}, using the original, unmodified images from each dataset, that is, no noise augmentation or synthetic corruption is applied during training. Models are then evaluated on both the original test sets and their noise-corrupted counterparts, allowing us to measure out-of-the-box robustness without any noise-specific adaptations. The simulated perturbations are chosen to reflect distortions that arise naturally in clinical acquisition pipelines, such as thermal and electronic noise (Gaussian), ultrasound artifacts (Speckle), photon-counting noise in low-dose settings (Poisson), and MRI background noise (Rician), making the evaluation clinically grounded rather than purely synthetic.

\textbf{Implementation details.} We investigate multiple U-Net variants, including U-Net++ \cite{zhou2019unet++} and a lightweight U-Net~\cite{ronneberger2015u} architecture with 4 levels and initial feature dimension of 8. All hyperbolic models use trainable curvature initialized to $c = 0.1$. We train with Dice-Focal loss, Riemannian Adam \cite{becigneul2018riemannian} optimizer ($lr = 10^{-3}$, weight decay $= 10^{-4}$), batch size 8, for 50 epochs. We deliberately avoid data augmentation to evaluate inherent geometric robustness. In the main paper, we show results for Hyperbolic U-Net with transposed convolution; the results with bilinear interpolation are in Appendix~\ref{ap:tc_vs_bi}. We also compare robustness of our models with a lightweight nnU-Net \cite{isensee2021nnu, isensee2024nnu} with the same architecture as our models. For this, we use the nnU-Netv2 implementation and the details are mentioned in Appendix \ref{ap:nnunet}.

\textbf{Evaluation metrics.} We report Dice score (DSC), mean Intersection over Union (mIoU), dataset IoU (dIoU), Hausdorff Distance (HD) and HD95. We focus on DSC in the main paper, with the other results in Appendix \ref{ap:full_results}. 

\subsection{Robustness Through Hyperbolic Geometry}
\textbf{Overview.} Table~\ref{tab:robustness} presents DSC on clean data and under mid-high noise levels. For all runs, we use transposed convolutions in the decoder. On clean test sets, Hyperbolic U-Net achieves the same average performance as Euclidean U-Net (average DSC: 0.770 vs. 0.765). Under noise, Hyperbolic U-Net significantly outperforms Euclidean U-Net across all datasets and noise types with average improvement of $48\%$ on Gaussian noise, $22\%$ on Speckle noise, $34\%$ on Poisson noise, $42\%$ on Rician noise, $17\%$ on Brightness shift and $17\%$ on Contrast variation. Specifically, we find that adding strong levels of noise hardly affects the DSC for Hyperbolic U-Net, highlighting that a hyperbolic foundation makes U-Net inherently noise robust. In Appendix \ref{ap:robust_noise_curves} and \ref{ap:robust_unetpp}, we show more results on U-Net++ and nnU-Net architectures, which yield the same conclusions.

\textbf{Noise curves and qualitative examples.} Figure~\ref{fig:dsc_degrade} shows DSC degradation curves across noise intensities for ISIC16. Hyperbolic U-Net/U-Net++ exhibit hardly any degradation, while the performance of their Euclidean counterparts collapses when noise increases. nnU-Net is more robust to noise than standard U-Net and U-Net++, due to the use of noise in data augmentation. However, even nnU-Net suffers from performance degradation. Our hyperbolic approach, only trained on clean data, obtains the best performance, especially when noise is most severe. Figure~\ref{fig:isic16_noise} provides qualitative comparisons.

\textbf{Geometric explanation of noise robustness.} 
To further investigate why Hyperbolic U-Net is more robust to noise, we analyze the geometric properties of hyperbolic embeddings. Hyperbolic space exhibits exponential volume growth, unlike the polynomial growth in Euclidean space, which leads to larger relative distances between points as a function of their norm. In the context of representation learning, this implies that features corresponding to different semantic classes are more widely separated, even when local variations exist. Consequently, perturbations introduced by noise are less likely to move a representation across class boundaries, effectively increasing the margin to decision boundary overlap.

To validate this explanation, we compute inter-class distances in the learned feature space (Appendix~\ref{ap:inter_class_distances}). We find that hyperbolic embeddings consistently exhibit higher class separation ratios compared to Euclidean embeddings. These results suggest that the observed robustness of Hyperbolic U-Net is rooted in its geometry as the hyperbolic representation space inherently creates more noise-tolerant embeddings, which preserves segmentation accuracy under various noise conditions.

\begin{table}[t]
\centering
\small
\resizebox{\linewidth}{!}{%
\begin{tabular}{llccccccc}
\toprule
 \textbf{Dataset} & \textbf{Model} & \textbf{Clean} & \textbf{Gaussian} & \textbf{Speckle} & \textbf{Poisson} & \textbf{Rician} & \textbf{Brightness} & \textbf{Contrast} \\
\midrule
\multirow{2}{*}{ISIC16}
& U-Net & \emph{0.92} & 0.52 & 0.75 & 0.62 & 0.72 & 0.79 & 0.75 \\
& \cellcolor{Gray} Hyp U-Net & \emph{0.91} & \cellcolor{Gray} \textbf{0.90} & \cellcolor{Gray} \textbf{0.90} & \cellcolor{Gray} \textbf{0.90} & \cellcolor{Gray} \textbf{0.90} & \cellcolor{Gray} \textbf{0.86} & \cellcolor{Gray} \textbf{0.87} \\
\hline
\multirow{2}{*}{ISIC18}
& U-Net & \emph{0.89} & 0.54 & 0.55 & 0.59 & 0.46 & 0.79 & 0.73 \\
& \cellcolor{Gray} Hyp U-Net & \emph{0.87} & \cellcolor{Gray} \textbf{0.86} & \cellcolor{Gray} \textbf{0.86} & \cellcolor{Gray} \textbf{0.86} & \cellcolor{Gray} \textbf{0.86} & \cellcolor{Gray} \textbf{0.84} & \cellcolor{Gray} \textbf{0.79} \\
\hline
\multirow{2}{*}{BUSI}
& U-Net & \emph{0.82} & 0.41 & 0.63 & 0.55 & 0.41 & 0.57 & 0.56 \\
& \cellcolor{Gray} Hyp U-Net & \emph{0.80} & \cellcolor{Gray} \textbf{0.79} & \cellcolor{Gray} \textbf{0.79} & \cellcolor{Gray} \textbf{0.80} & \cellcolor{Gray} \textbf{0.79} & \cellcolor{Gray} \textbf{0.79} & \cellcolor{Gray} \textbf{0.76} \\
\hline
\multirow{2}{*}{SANET}
& U-Net & \emph{0.78} & 0.49 & 0.56 & 0.49 & 0.50 & 0.52 & 0.61 \\
& \cellcolor{Gray} Hyp U-Net & \emph{0.76} & \cellcolor{Gray} \textbf{0.67} & \cellcolor{Gray} \textbf{0.71} & \cellcolor{Gray} \textbf{0.69} & \cellcolor{Gray} \textbf{0.67} & \cellcolor{Gray} \textbf{0.67} & \cellcolor{Gray} \textbf{0.67} \\
\hline
\multirow{2}{*}{KVASIR}
& U-Net & \emph{0.86} & 0.48 & 0.65 & 0.50 & 0.55 & 0.64 & 0.55 \\
& \cellcolor{Gray} Hyp U-Net & \emph{0.83} & \cellcolor{Gray} \textbf{0.70} & \cellcolor{Gray} \textbf{0.71} & \cellcolor{Gray} \textbf{0.69} & \cellcolor{Gray} \textbf{0.73} & \cellcolor{Gray} \textbf{0.70} & \cellcolor{Gray} \textbf{0.76} \\
\hline
\multirow{2}{*}{ACTA}
& U-Net & \emph{0.50} & 0.50 & 0.50 & 0.50 & 0.50 & 0.50 & 0.50 \\
& \cellcolor{Gray} Hyp U-Net & \emph{0.54} & \cellcolor{Gray} \textbf{0.54} & \cellcolor{Gray} \textbf{0.54} & \cellcolor{Gray} \textbf{0.54} & \cellcolor{Gray} \textbf{0.53} & \cellcolor{Gray} \textbf{0.53} & \cellcolor{Gray} \textbf{0.52} \\
\hline
\multirow{2}{*}{DCBR}
& U-Net & \emph{0.62} & 0.51 & 0.54 & 0.51 & 0.51 & 0.48 & 0.53 \\
& \cellcolor{Gray}  Hyp U-Net & \emph{0.65} & \cellcolor{Gray} \textbf{0.63} & \cellcolor{Gray} \textbf{0.60} & \cellcolor{Gray} \textbf{0.60} & \cellcolor{Gray} \textbf{0.58} & \cellcolor{Gray} \textbf{0.58} & \cellcolor{Gray} \textbf{0.58} \\
\bottomrule
\end{tabular}}
\caption{\textbf{Robust medical image segmentation Dice scores.} We report the effect of hyperbolic U-Net versus a standard U-Net, with the same architecture and number of parameters, on seven datasets and six noise types. We use the following settings for all: Gaussian ($\sigma_{g}=0.2$), Speckle ($\sigma_{s}=0.3$), Poisson ($\lambda=10$), Rician ($\sigma_{r}=0.2$), Brightness ($\Delta_{b}=0.5$), Contrast ($\Delta_{c}=0.3$). We find that a hyperbolic U-Net is much more robust to noise, brightness and contrast shifts.}
\label{tab:robustness}
\end{table}

\begin{figure}[t]
\floatconts
  {fig:dsc_degrade}
  {\caption{\textbf{Performance curves (DSC) on ISIC16 for all noise types.} Hyperbolic U-Net and U-Net++ can handle strong noise interferences, outperforming their Euclidean counterparts and nnUNet.}}
  {\includegraphics[width=\linewidth]{figs/isic16_unet_unetpp_hunetpp_nnunet2_cv_hyp_dice_score.pdf}}
\end{figure}


\begin{figure}[t]
 % Caption and label go in the first argument and the figure contents
 % go in the second argument
\floatconts
  {fig:isic16_noise}
  {\caption{\textbf{Qualitative example} of Hyperbolic versus Euclidean U-Net on a skin lesion image for all noise types. We report the predictions on a sample from each dataset on the following perturbations: Gaussian ($\sigma_{g}=0.2$), Speckle ($\sigma_{s}=0.3$), Poisson ($\lambda=10$), Rician ($\sigma_{r}=0.2$), Brightness ($\Delta_{b}=0.5$), Contrast ($\Delta_{c}=0.3$). Segmentation predictions of Hyperbolic U-Net remain relatively stable under mid-high levels of noise compared to Euclidean U-Net.} We provide more examples in Appendix \ref{ap:robust_qual_res}.}
  {\includegraphics[width=0.75\linewidth]{figs/isic16_robustness_3.pdf}}
\end{figure}

\subsection{Transpose Convolution vs. Bilinear Upsampling in Hyperbolic Space}
This paper outlines two ways to perform upscaling: hyperbolic transposed convolution vs. hyperbolic bilinear upsampling. In Appendix~\ref{ap:tc_vs_bi} we show that both achieve similar DSC on clean data as well as under noise. Bilinear upsampling however, reduces parameters by $57\%$, while the memory consumption ($\approx 1.4$ GB) and training time remain identical. We hypothesize that this may be due to the fact GPUs we use are currently not optimized for hyperbolic operations. We conclude that both approaches are viable as decoders, with bilinear upsampling preferable in resource-constrained clinical settings.

Importantly, while both approaches are viable for standard U-Net decoders, we observe a practical difference when extending to nested decoder architectures such as U-Net++. In this setting, the hyperbolic transposed convolution variant could not be trained due to GPU memory exhaustion, whereas the bilinear upsampling variant remained trainable. We attribute this to the higher number of hyperbolic operations required by transposed convolution, which are repeatedly invoked in the nested computational graph of U-Net++, while bilinear upsampling is a single-shot operation. We conclude that both approaches are viable as decoders, with bilinear upsampling preferable in resource-constrained clinical settings.

\subsection{Effects of Newton-Approximation Weight Initialization} 
To validate our initialization, we compare against the identity initialization from \citeauthor{van2023poincare} and normal distribution initialization from \citeauthor{shimizu2021hyperbolic} for two analyses:

\textbf{Feature diversity.} Figure~\ref{fig:weight_init} (left) visualizes feature maps from random Gaussian blob inputs after hyperbolic transposed convolution layer (input = 2D, output = 4D). Our initialization produces diverse features across channels, while identity initialization yields either near-zero or redundant feature maps, indicating poor gradient flow.
\textbf{Norm preservation.} Figure~\ref{fig:weight_init} (right) shows output-to-input norm ratios across different input-to-output feature ratios. We start at input features $=16$ and vary the output features to get different input-to-output feature ratios. Only our initialization maintains stable norms, making our approach the most desirable way to initialize a Hyperbolic U-Net.


\begin{figure}[t]
\floatconts
  {fig:weight_init}
  {\caption{\textbf{Effect of our initialization.} Left: we visualize the inputs and the output feature maps obtained from transposed convolution layer. \citeauthor{shimizu2021hyperbolic} and our approach produces expressive features. Right: we show the output-to-input ratios after hyperbolic transposed convolutions for various feature ratios. Only our approach retains the desired ratio of 1.}}
  {%
    \begin{minipage}[t]{0.36\linewidth}
        
        \includegraphics[width=\linewidth]{figs/init_feat.pdf}
    \end{minipage}\hfill
    \begin{minipage}[t]{0.56\linewidth}
        
        \includegraphics[width=\linewidth]{figs/hconvT_shim_max_new5.pdf}
    \end{minipage}
  }
\end{figure}
