\section{Methods}
\textbf{Riemannian geometry.}
Riemannian geometry is the study of Riemannian manifolds, which are pairs $(\mathcal{M},\mathfrak{g})$ consisting of a smooth manifold $\mathcal{M}$ and a Riemannian metric $\mathfrak{g}$. Smooth manifolds are topological spaces that are locally similar enough to Euclidean space to allow the use of calculus on small neighborhoods, but which can differ tremendously from Euclidean space at larger scales. At each point $\boldsymbol{x}$ of $\mathcal{M}$, we can define the space of possible directions with velocities with which we can travel from $\boldsymbol{x}$ as $\mathcal{T}_{\boldsymbol{x}}\mathcal{M}$, which is often named the tangent space at $\boldsymbol{x}$. Locally, such a tangent space can be seen as the Euclidean approximation to the manifold at $\boldsymbol{x}$. The Riemannian metric $\mathfrak{g}$ is a function that assigns an inner product to each tangent space, which allows the computation of the lengths of curves defined on $\mathcal{M}$. Geodesics are special cases of such curves on manifolds, which generalize the concept of straight lines. When it exists, a geodesic between any two points $\boldsymbol{x}$ and $\boldsymbol{y}$ on the manifold is the curve that forms the shortest path between them.

Two important mappings on Riemannian manifolds are the exponential and logarithmic maps. The exponential map at some point $\boldsymbol{x}$ takes a tangent vector $\boldsymbol{v}$ from $\mathcal{T}_{\boldsymbol{x}} \mathcal{M}$ and outputs the point on the manifold that we would end up in if we were to travel from $\boldsymbol{x}$ in the direction of $\boldsymbol{v}$ with velocity $\sqrt{\mathfrak{g}(\boldsymbol{v}, \boldsymbol{v})}$ for 1 unit of time. The logarithmic map at $\boldsymbol{x}$ is the inverse of the exponential map, so it takes some point $\boldsymbol{y}$ and returns the tangent vector in $\mathcal{T}_{\boldsymbol{x}} \mathcal{M}$ that represents the direction and velocity needed to arrive at $\boldsymbol{y}$ in a single unit of time. In what follows, we will treat hyperbolic space as a Riemannian manifold and use the tools from Riemannian geometry to enable computation within this space. For a comprehensive overview of Riemannian geometry, we refer the reader to \cite{lee2018introduction}.

\textbf{Poincar\'e ball model.}
This paper operates on the commonly used Poincar\'e ball model of hyperbolic space. For $n$-dimensional hyperbolic space with constant negative curvature $-c$, this is defined as the Riemannian manifold $(\mathbb{B}_{c}^{n}, \mathfrak{g}_{c})$, where
\begin{equation}
    \mathbb{B}_{c}^{n} = \Big\{\boldsymbol{x} \in \mathbb{R}^{n}: \|\boldsymbol{x}\|^{2} < \frac{1}{c}\Big\}, \quad \mathfrak{g}_{c} = (\lambda_{\boldsymbol{x}}^{c})^{2}I_{n}, \quad \lambda_{\boldsymbol{x}}^{c} = \frac{2}{1 - c\|\boldsymbol{x}\|^{2}},
\end{equation}
with $I_{n}$ being the $n$-dimensional identity matrix. The Poincar\'e ball model can be turned into a gyrogroup~\cite{ungar2022gyrovector} by endowing it with M\"obius addition, defined as
\begin{equation}
    \boldsymbol{x} \oplus_c \boldsymbol{y} =\frac{\left(1+2 c\langle\boldsymbol{x}, \boldsymbol{y}\rangle+c\|\boldsymbol{y}\|^2\right) \boldsymbol{x}+\left(1-c\|\boldsymbol{x}\|^2\right) \boldsymbol{y}}{1+2 c\langle\boldsymbol{x}, \boldsymbol{y}\rangle+c^2\|\boldsymbol{x}\|^2\|\boldsymbol{y}\|^2},
\end{equation}
where $\boldsymbol{x},\boldsymbol{y} \in \mathbb{B}_{c}^{n}$, $r \in \mathbb{R}$ and where $\|\cdot\|$ and $\langle\cdot, \cdot\rangle$ denote the Euclidean norm and the inner product, respectively. The exponential map projects a tangent vector back onto the manifold along a geodesic, while the logarithmic map performs the inverse operation, mapping a point on the manifold to a tangent vector at a reference location. Using the definition of M\"obius addition, the exponential and logarithmic maps can be written as
\begin{equation}
\label{eq:explog}
\begin{aligned}
\exp _{\boldsymbol{x}}^c(\boldsymbol{v}) & =\boldsymbol{x} \oplus_c\left(\tanh \left(\frac{\sqrt{c} \lambda_{\boldsymbol{x}}^c\|\boldsymbol{v}\|}{2}\right) \frac{\boldsymbol{v}}{\sqrt{c}\|\boldsymbol{v}\|}\right), \\
\log _{\boldsymbol{x}}^c(\boldsymbol{y}) & =\frac{2}{\sqrt{c} \lambda_{\boldsymbol{x}}^c} \tanh ^{-1}\left(\sqrt{c}\left\|-\boldsymbol{x} \oplus_c \boldsymbol{y}\right\|\right) \frac{-\boldsymbol{x} \oplus_c \boldsymbol{y}}{\left\|-\boldsymbol{x} \oplus_c \boldsymbol{y}\right\|},
\end{aligned}
\end{equation}
where $\boldsymbol{x},\boldsymbol{y} \in \mathbb{B}_{c}^{n}$ and $\boldsymbol{v} \in \mathcal{T}_{\boldsymbol{x}}\mathbb{B}_{c}^{n}$~\cite{ganea2018hyperbolic}. Furthermore, we can compute the distance between any two points $\boldsymbol{x},\boldsymbol{y} \in \mathbb{B}_{c}^{n}$ as 
\begin{equation}
\label{eq:hdist}
d_c(\boldsymbol{x}, \boldsymbol{y})=\frac{2}{\sqrt{c}} \tanh ^{-1}\left(\sqrt{c}\left\|-\boldsymbol{x} \oplus_c \boldsymbol{y}\right\|\right) .
\end{equation}
We follow \citeauthor{shimizu2021hyperbolic,van2023poincare} to build a fully hyperbolic convolutional neural network. As a foundation, Poincar\'e multinomial logistic regression is defined by computing the score for each of $n$ classes for some $m$-dimensional input  $\boldsymbol{x} \in \mathbb{B}_{c}^{m}$ as
\begin{equation}
v_k(\boldsymbol{x})=\frac{2}{\sqrt{c}}\left\|\boldsymbol{z}_k\right\| \sinh ^{-1}\left(\lambda_{\boldsymbol{x}}^c\left\langle\sqrt{c} \boldsymbol{x}, \frac{\boldsymbol{z}_k}{\left\|\boldsymbol{z}_k\right\|}\right\rangle \cosh \left(2 \sqrt{c} r_k\right) -\left(\lambda_{\boldsymbol{x}}^c-1\right) \sinh \left(2 \sqrt{c} r_k\right)\right),
\end{equation}
where $\boldsymbol{z}_{k} \in \mathcal{T}_{0}\mathbb{B}_{c}^{m}$ and $r_{k} \in \mathbb{R}$ are the parameters for the $k$-th class. These scores are equivalent to the distances between the input $\boldsymbol{x}$ and the $n$ different Poincar\'e hyperplanes determined by the parameters $\{(\boldsymbol{z}_{k}, r_{k})\}_{i = 1}^{n}$. Here, $\boldsymbol{z}_{k}$ determines the orientation of the hyperplane while $r_{k}$ determines its offset with respect to the origin. A Poincar\'e fully connected layer mapping input $\boldsymbol{x} \in \mathbb{B}_{c}^{m}$ to $\mathbb{B}_{c}^{n}$ is in turn defined as
\begin{equation}
\label{eq:hypfc}
\boldsymbol{y}=\mathcal{F}^c(\boldsymbol{x} ; Z, \boldsymbol{r})=\frac{\boldsymbol{w}}{1+\sqrt{1+c\|\boldsymbol{w}\|^2}}, \quad
\boldsymbol{w}=\left(\frac{1}{\sqrt{c}} \sinh \left(\sqrt{c} v_k(\boldsymbol{x})\right)\right)_{k=1}^n,
\end{equation}
where the $v_{k}(\cdot)$ are the scores from the Poincar\'e multinomial logistic regression and where $Z = [\boldsymbol{z}_{1}|...|\boldsymbol{z}_{n}] \in (\mathcal{T}_{0}\mathbb{B}_{c}^{m})^{n} = \mathbb{R}^{m \times n}$ and $\boldsymbol{r} = (r_{k})_{k=1}^{n} \in \mathbb{R}^{m}$ are the parameters of the layer. Given hyperbolic fully connected layers, \citeauthor{van2023poincare} provide formulations for 2D convolutions, batch normalization and the ReLU activation in hyperbolic space, along with their weight initialization. We use these blocks as a starting point to develop a fully Hyperbolic U-Net.
%We use these building blocks and introduce additional operations such as 2D transposed convolutions and bilinear upsample in hyperbolic space and a suitable norm-preserving initialization to arrive at a Hyperbolic U-Net.

\subsection{Hyperbolic U-Net}

We consider the problem of image segmentation where we are given an input image $\boldsymbol{x} \in \mathbb{R}^{H \times W \times 3}$, with height $H$ and width $W$ of the image, respectively. For each pixel $\boldsymbol{x}_{ij} \in \mathbb{R}^3$, $i = 1, ..., H$, $j = 1, ..., W$, we need to assign a label $\boldsymbol{y}_{ij} \in \{1, \ldots, C\}$, denoting one of $C$ classes. Let $f(\boldsymbol{x}): \mathbb{R}^{H \times W \times 3} \rightarrow \mathbb{R}^{H \times W \times C}$ denote the function that transforms each pixel to a probability distribution over all $C$ classes per pixel. The U-Net architecture is highly effective at approximating this function~\cite{ronneberger2015u, isensee2021nnu}. Therefore, we strive to formulate a geometric equivalent of the U-Net architecture in the Poincar\'e ball model, which we name Hyperbolic U-Net.

U-Net typically consists of four encoder and four decoder blocks with skip connections from the encoder to the decoder. The encoder blocks comprise convolution layers, batch normalizations, ReLU activations, and max pooling layers. The decoder blocks are made of identical layers that inverse these operations, where max pooling is replaced either with a transposed convolution or bilinear upsampling. To create a fully hyperbolic U-Net, all these operations need to be reformulated in hyperbolic space. Below, we outline how to formalize and construct (i) Poincar\'e 2D transposed convolutions and (ii) hyperbolic bilinear upsampling, and (iii) how to effectively initialize hyperbolic convolutional neural networks. 

To embed Euclidean pixel vectors into hyperbolic space, we use the exponential map at the origin, $\exp _{\boldsymbol{0}}^c:\mathcal{T}_{\boldsymbol{0}}\mathbb{B}_{c}^{n}\rightarrow \mathbb{B}_{c}^{n}$. Thus, each pixel is mapped as $\boldsymbol{\hat{x}}_{ij} = \exp_{\boldsymbol{0}}^{c}(\boldsymbol{x}_{ij}) \in \mathbb{B}_{c}^{3}$. Consequently, let our hyperbolic network produce, for each pixel at location $(i,j)$, an output $\boldsymbol{\hat{z}}_{ij} \in \mathbb{B}_{c}^{C}$ which lies in the Poincar\'e ball. To convert these hyperbolic outputs into Euclidean logits, we apply the logarithmic map at the origin $\log _{\boldsymbol{0}}^c:\mathbb{B}_{c}^{n} \rightarrow \mathcal{T}_{\boldsymbol{0}}\mathbb{B}_{c}^{n}$. Using this, the hyperbolic pixel-wise outputs are mapped to Euclidean logits $\boldsymbol{l}_{ij} = \log _{\boldsymbol{0}}^c(\boldsymbol{\hat{z}}_{ij}) \in \mathbb{R}^{C}$.

\subsection{Poincaré Transposed Convolutions}
Image-to-image networks require upscaling. Here, we formalize the 2D transposed convolution operation in the Poincar\'e ball model by extending the geometric principles of the Poincar\'e convolution layer. Let the input image $\boldsymbol{x}$ have pixel values $\boldsymbol{x}_{kl} \in \mathbb{B}_{c}^{C_{in}}, \quad k = 1, ..., H_{in}, \quad l = 1, ..., W_{in},$
where $C_{in}$ is the number of input channels and where $H_{in}$ and $W_{in}$ are the height and width of the image, respectively. Then we can define a 2D Poincar\'e transposed convolution operation with $C_{out}$ output channels with pixel values $
    \boldsymbol{h}_{ij} \in \mathbb{B}_{c}^{C_{out}}, \quad i = 1, ..., H_{out}, \quad j = 1, ..., W_{out}.$
For each input pixel $\boldsymbol{x}_{kl}$, we compute an output receptive field of size $K \times K$, with $K$ odd, that determines the output pixels $\boldsymbol{h}_{ij}$ where
$
    k  - \lfloor\frac{K}{2}\rfloor \leq i \leq k + \lfloor\frac{K}{2}\rfloor, \quad l - \lfloor\frac{K}{2}\rfloor \leq j \leq l + \lfloor\frac{K}{2}\rfloor.
$
We denote the output receptive field centered at $(k,l)$ by $Y_{kl}$. Analogous to the Euclidean transposed convolution, the output values of this receptive field are computed by applying a Poincar\'e fully connected layer $\mathcal{F}^{c}$ with parameters $Z$ and $r$, as defined in equation~\ref{eq:hypfc}, to the input pixels and then splitting the output into $K^2$ individual vectors in $\mathbb{B}_c^{C_{out}}$, so 
$
    \boldsymbol{Y}_{kl} = \mathcal{S}_{K^2 C_{out} \rightarrow C_{out}} (\mathcal{F}^{c}(x_{kl}; Z, r)).
$
Note that similar to the Poincar\'e convolution operation introduced by \citeauthor{van2023poincare}, usual splitting is inappropriate for vectors on the Poincar\'e ball, as this can result in vectors outside the manifold. Therefore, we employ the $\beta$-split operation defined by \citeauthor{shimizu2021hyperbolic} as follows:
\begin{equation}
    \mathcal{S}_{K^2 C_{out} \rightarrow C_{out}} (\mathbf{z}) = \Big(\exp_{\mathbf{0}}^c\big(\beta_{C_{out}} \beta_{K^2 C_{out}}^{-1} \mathbf{v}_i\big)\Big)_{i=1}^{K^2}, \quad (\mathbf{v}_1^T, \ldots, \mathbf{v}_{K^2}^T)^T = \log_{\mathbf{0}}^c (\mathbf{z}),
\end{equation}
where $\beta_i = B(\frac{n}{2}, \frac{1}{2})$ with $B$ being the beta function. This splitting operation takes as input a single hyperbolic vector in $\mathbb{B}_{c}^{K^2 C_{out}}$ and splits it into $K^2$ vectors in $\mathbb{B}_c^{C_{out}}$ such that the average of the Poincar\'e norms of the output vectors is equal to the Poincar\'e norm of the input.

\subsection{Hyperbolic Bilinear Upsampling}

While transposed convolutions can be employed to learn task-specific upsampling filters, they can also be compute intensive and increase the number of learnable parameters. This is often mitigated in U-Net by replacing them with bilinear upsampling. Therefore, we introduce the hyperbolic analogue of bilinear upsampling for images in the Poincar\'e ball model. Euclidean bilinear upsampling is performed using linear interpolation first in one direction, and then again in another direction. In the hyperbolic setting, we retain the same grid structure and interpolation weights, but we replace all Euclidean midpoint computations with their geodesic counterparts in the Poincar\'e ball. 

For an upsampling factor $s$ and input size ($W_{in}, H_{in}$), each output coordinate $(i,j)$ corresponds to a fractional location $(u,v)$ in the input image, where $u = \frac{i}{s}, v = \frac{j}{s}$. Let $(k,l), (k+1,l), (k,l+1),$ and $(k+1,l+1)$ be the four neighboring input pixels surrounding $(u,v)$, and let the Euclidean interpolation weights be $\alpha = u - k$, $\beta = v - l$. In the Poincar\'e ball model, we perform a repeated geodesic interpolation, where each pairwise midpoint is replaced by the geodesic midpoint defined by the exponential and the logarithmic maps (see equation~\ref{eq:explog}) at the relevant tangent space. For two hyperbolic vectors $\boldsymbol{a}, \boldsymbol{b} \in \mathbb{B}_{c}^{d}$, their weighted geodesic interpolation with weight $t \in [0, 1]$ is
$
    \gamma(\boldsymbol{a}, \boldsymbol{b}; t) = \exp_{\boldsymbol{a}}^{c}(t \log_{\boldsymbol{a}}^{c}(\boldsymbol{b})).
$
Using the operator $\gamma$, the hyperbolic bilinear upsampling produces each output pixel as 
\begin{equation}
    \boldsymbol{h}_{ij} = \gamma\Big(\gamma\big(\boldsymbol{x}_{k,l},\boldsymbol{x}_{k+1,l}; \alpha\big), \gamma\big(\boldsymbol{x}_{k,l+1},\boldsymbol{x}_{k+1,l+1}; \alpha\big); \beta\Big),
\end{equation}
which lies in $\mathbb{B}_{c}^{C_{in}}$ by construction and $i = 1,.., sH_{in}$, $j = 1,..., sW_{in}$. 

\subsection{Newton-Scaled Weight Initialization}

\citeauthor{shimizu2021hyperbolic} propose an initialization for the Poincar\'e fully connected and convolutional layers, that produces expressive features but doesn't ensure norm preservation. The identity initialization proposed by \citeauthor{van2023poincare} solves this for $m \leq n$ (input and output dimensions). However, U-Net comprises encoder and decoder blocks, and for the decoder, the number of input features exceeds the number of output features. In these layers, both initializations fail to preserve the norm, leading to vanishing or exploding hyperbolic norms. Moreover, identity initialization provides too little feature diversity, which can slow convergence and degrade performance.

We introduce a Newton-scaled hyperbolic weight initialization. We first apply a standard Euclidean initialization (e.g., Kaiming or orthogonal) to the Poincar\'e fully connected layers in hyperbolic convolution or transposed convolution operations, and subsequently rescale each weight matrix by a scalar $s > 0$ to enforce hyperbolic norm preservation.
For the Poincar\'e fully connected layer $\boldsymbol{y}=\mathcal{F}^c(\boldsymbol{x} ; Z, \boldsymbol{r})$ mapping inputs $\boldsymbol{x} \in \mathbb{B}_{c}^{m}$ to $\boldsymbol{y} \in \mathbb{B}_{c}^{n}$, we seek a scalar $s$ such that 
$
    \mathbb{E}_{\boldsymbol{x}}\Big[d_{c}\big(\mathcal{F}^c(\boldsymbol{x} ; sZ, \boldsymbol{r}), 0\big)^{2}\Big] \approx \mathbb{E}_{\boldsymbol{x}}\Big[d_{c}(\boldsymbol{x}, 0)^{2}\Big],
$
ensuring that the average squared hyperbolic distance (Equation~\ref{eq:hdist}) to the origin is preserved across layers. We define
\begin{equation}
    g(s) = \mathbb{E}_{\boldsymbol{x}}\Big[d_{c}(\mathcal{F}^c(\boldsymbol{x} ; sZ, \boldsymbol{r}), 0)^{2}\Big] -\mathbb{E}_{\boldsymbol{x}}\Big[d_{c}(\boldsymbol{x}, 0)^{2}\Big],
\end{equation}
where in practice both expectations are approximated using a randomly sampled batch of training data, and we approximate the root of $g(s)$ using Newton’s method. The derivative $g'(s)$ is obtained via automatic differentiation, enabling an efficient and stable layer-wise optimization procedure. Given a randomly sampled batch, inputs to each layer are collected during a single forward pass. Then, for each individual layer, Newton iterations are performed in closed form, and the final scalar $s$ is applied directly to the weight matrix $Z$ of the layer. The entire process then becomes: (i) each hyperbolic (transposed) convolutional layer is initialized using either Kaiming or orthogonal initialization applied to its Euclidean parameter $Z$; (ii) during a single forward pass with a randomly sampled batch of inputs, we record the features passed to each hyperbolic layer; and (iii) for each layer we compute the scalar $s$ solving $g(s)=0$  using 5 to 10 Newton iterations, and update $Z \leftarrow sZ$.

This procedure produces an initialization that is as expressive as Kaiming or orthogonal initialization while enforcing empirical hyperbolic norm preservation, even when $m > n$, such as in transposed convolutions. In practice, this leads to more stable training in Hyperbolic U-Net, through improved feature diversity, and faster convergence, while avoiding the exploding-norm issues observed with identity initialization in upsampling layers.
