\section{Perpective Projection and 2D/3D Registration}
\subsection{Perspective Projection}
\label{appendix:perspective_projection}
We adopt the same C-arm geometry as in the main text and prior work on perspective-projection registration \cite{suh20252d, mo2025enhancedlandmarkdetectionmodel}. Let the $y$–axis overlap with the source–detector direction, with the detector lying in the plane $y = 0$ and the X–ray source located at $(0, \mathrm{SDD}, 0)$, where $\mathrm{SDD}$ denotes the source to detector distance. The source to volume distance is denoted by $\mathrm{SVD}$, and the volume to detector distance by $\mathrm{VDD}$.

For each patient, a fixed offset
\begin{align}
\mathbf{c}_0
=
\begin{bmatrix}
0 & \mathrm{VDD} & -m_z
\end{bmatrix}^T    
\end{align}
accounts for the relative position of the pelvis with respect to the C–arm isocenter and for patient-specific extra–planar translation $m_z$ (in mm) applied along the $z$-axis to make the CT volume center with the anatomical pelvic center. All 3D landmarks are first expressed in physical units $\mathbf{X}_i \in \mathbb{R}^3$ (mm) in the CT coordinate system and then
interpreted in this C–arm frame.

A rigid-body pose is parameterized by three Euler angles $\boldsymbol{r} = (r_x,r_y,r_z)$ (in degrees) and a 3D translation $\boldsymbol{t} = (t_x,t_y,t_z)$ (in mm). The corresponding rotation matrix is
\begin{align}
    R(\boldsymbol{r}) = R_y(r_y)\,R_x(r_x)\,R_z(r_z),
\end{align}
where $R_x, R_y, R_z$ are standard rotations about the $x, y, z$ axes, respectively. This Euler-angle convention and rotation order match the pose parameterization used in DiffDRR \cite{gopalakrishnan2022fast}, ensuring consistency between DRR generation and landmark-based registration. We rotate about the fixed point $\mathbf{c}_0$, so that the transformed 3D
landmark is
\begin{align}
\label{eq:3d-transform}
\mathbf{X}_i^{\mathrm{T}}(\boldsymbol{r},\boldsymbol{t})
=
R(\boldsymbol{r})\bigl(\mathbf{X}_i - \mathbf{c}_0\bigr)
+ \mathbf{c}_0 + \tilde{\boldsymbol{t}},
\end{align}
where the effective translation $\tilde{\boldsymbol{t}}$ incorporates the sign convention used by the imaging coordinate system (e.g.\ a positive $t_y$ corresponds to motion toward or away from the detector).

Given a transformed 3D point $\mathbf{X}_i^{\mathrm{T}} = (X_i^{\mathrm{T}},Y_i^{\mathrm{T}},Z_i^{\mathrm{T}})^\top$, its perspective projection onto the detector is given by the standard cone–beam model with source at $(0,\mathrm{SDD},0)$ and detector plane $y=0$. The scale factor along the ray is
\begin{align}
    \gamma_i = \frac{\mathrm{SDD}}{\mathrm{SDD} - Y_i^{\mathrm{T}}},
\end{align}
and the corresponding 2D detector coordinates are
\begin{align}
\label{eq:projection}
\mathbf{p}_i(\boldsymbol{r},\boldsymbol{t}) =
\begin{bmatrix}
u_i \\ v_i
\end{bmatrix} = \gamma_i
\begin{bmatrix}
X_i^{\mathrm{T}} \\ Z_i^{\mathrm{T}}
\end{bmatrix}.
\end{align}
In practice, $(u_i,v_i)$ are then shifted and scaled to the discrete pixel
grid (e.g.\ with the image center at $(W/2,H/2)$), but this is an affine
post-processing step and does not change the underlying projective geometry.

Equations~\eqref{eq:3d-transform} and~\eqref{eq:projection} together define
the forward model
\begin{align}
\pi(\boldsymbol{r},\boldsymbol{t}, \mathbf{X}_i) = \mathbf{p}_i(\boldsymbol{r},\boldsymbol{t}),    
\end{align}
which is the explicit form of $\pi(\theta, \mathbf{X}_c)$ used in the main
text.

\subsection{Landmark-based 2D/3D Registration}
\label{appendix:2D_3D_registration}
For each fluoroscopy image, we are given a set of observed 2D landmark coordinates $\{\mathbf{y}_i\}_{i \in \mathcal{V}}$, where
$\mathcal{V}$ indexes the visible landmarks after applying the chosen visibility
and uncertainty filters. The corresponding 3D CT landmarks
$\{\mathbf{X}_i\}_{i \in \mathcal{V}}$ are fixed and known.

The landmark-based 2D/3D registration problem is formulated as a non-linear least-squares optimization over the 6-DOF pose $\xi = (\boldsymbol{r},\boldsymbol{t})$:
\begin{align}
\label{eq:appendix-reg}
\xi^\ast
=
\arg\min_{\xi}
\sum_{i \in \mathcal{V}}
\left\lVert
\pi(\xi,\mathbf{X}_i)
- \mathbf{y}_i
\right\rVert_2^{2}.    
\end{align}
In practice, landmarks with missing or invalid 2D coordinates are excluded
from $\mathcal{V}$, and a minimum of three valid landmarks is required to
solve for the 6-DOF pose.

Stacked residual vector,
\begin{align}
\mathbf{r}(\xi) =
\bigl[
\mathbf{p}_1(\xi) - \mathbf{y}_1;\;
\mathbf{p}_2(\xi) - \mathbf{y}_2;\;
\dots;\;
\mathbf{p}_{|\mathcal{V}|}(\xi) - \mathbf{y}_{|\mathcal{V}|}
\bigr]
\in \mathbb{R}^{2|\mathcal{V}|}
\end{align}
denote the stacked residual vector, where $\mathbf{p}_i(\xi)$ is defined by \eqref{eq:3d-transform}–\eqref{eq:projection}. Eq.~\eqref{eq:appendix-reg} corresponds to minimizing $\|\mathbf{r}(\xi)\|_2^2$.

The pose $\xi^\ast$ is obtained using a Levenberg–Marquardt–type least-squares solver \cite{gavin2019levenberg}, initialized from a neutral pose (zero rotation and translation). For each test image, the optimization is run twice:
\begin{itemize}
  \item with $\mathcal{V} = \mathcal{V}_0$, the set of all landmarks, and
  \item with $\mathcal{V} = \mathcal{V}_{\text{filt}}$, the subset obtained after discarding the $K$ most epistemically uncertain landmarks.
\end{itemize}
The resulting poses are denoted $\xi^\ast_{\text{all}}$ and $\xi^\ast_{\text{filt}}$, respectively. Their rotation and translation components are compared to the ground-truth C–arm pose to quantify how uncertainty-aware landmark selection influences registration accuracy, as reported in Section \ref{section:results}.