\section{Related Work}
\begin{table}
\centering
\begin{tabular}{|c|c|c|c|}
%\cline{2-4}
\hline
\multirow{2}*{Method} & Mean & Median Rot. & Median Trans. \\
    &  Runtime (s) & Error (deg) & Error (mm) \\ \hline

Intensity ($512\times 512 \text{ px}^2$) & 95          & 54.98                    & 27.58                  \\ \hline
Intensity ($100\times 100 \text{ px}^2$) & 6.2           & 70.41                    & 33.49                  \\ \hline
Landmark+PnP [Baseline]  & 0.1 & 12.96                     & 32.70                   \\ \hline
Weighted Landmark+PnP [\textbf{Prop.}] & 0.9 & 2.73                     & 6.97                   \\ \hline
\end{tabular}
% \caption{Comparison of intensity-based and landmark-based registration methods, evaluated with all initial pose parameters set to zero. The intensity approach uses DiffDRR with NCC as the projection metric with projection image size of 512x512, while landmark-based methods estimate pose from detected landmarks, with or without MC dropout based uncertainty ($S=100$). All landmark methods use PnP for final pose computation.}
\caption{Comparison of intensity- and landmark-based methods, evaluated in terms of mean total registration time and median rotation and translation error. The Intensity method is the DiffDRR (\cite{gopalakrishnan2022fast}) projection metric with varying 2D image sizes, while Landmark+PnP use a U-Net landmark annotator fed into a direct pose optimization, either with or without our proposed weights (c.f. Finetune + Test Time CW in Table \ref{tab:pose_comparison_w_nograd}). The weights are estimated with MC dropout ($S=100$).}%, executing all stochastic forward passes simultaneously in a single parallelized batch to optimize inference efficiency.}
\label{tab:intensity_vs_landmark}
\end{table}
% \begin{figure}
%     \centering
%     \includegraphics[width=0.32\linewidth]{5_results/figures/MC_overlay/0105_uncertainty.png}
%     \includegraphics[width=0.32\linewidth]{5_results/figures/MC_overlay/0106_uncertainty.png}
%     \includegraphics[width=0.32\linewidth]{5_results/figures/MC_overlay/0121_uncertainty.png}
%     \caption{Visualization of MC dropout prediction with dropout rate of 0.1.}
%     \label{fig:mc_prediction_overlay}
% \end{figure}


%%%%%%%%%
%%%%\BlueComment{Is this necessary?} Automatic detection of anatomical landmarks in radiographs and fluoroscopy is well studied, with approaches ranging from heatmap regression, direct coordinate regression, and dilation-erosion based label augmentation \cite{schwendicke2021deep, suh2023dilation}. These methods typically use encoder-decoder CNNs such as U-Nets and their variants and are often heavily augmented by image augmentation \CiteLater{spatial transformers 2015}, but have become a standardized computer vision task in medical vision.


%are trained on diverse views of the hip, knee, and spine to improve robustness to projection angle, contrast, and occlusion. %Performance is usually summarized by per-landmark localization error in pixels or millimeters and the resulting landmark coordinates are then passes as fixed point estimates to downstream tasks such as measurement or registration where all predicted landmarks are implicitly treated as equally reliable.

There are multiple existing approaches for rigid 2D/3D registration of radiography or fluoroscopy to volumetric CT. Most relevant to our work are landmark- and feature-based methods, which assume correspondence between 3D anatomical landmarks in CT and their 2D projections \cite{bier2018x, grupp2020automatic}. These are the x-ray/fluoroscopy case of the Perspective-n-Point problem from general imaging \cite{gao2003complete,lepetit2009ep,li2012robust}. We choose to solve this optimization using gradient based methods due to their relative simplicity and apparent quality for our domain.

%of the pelvis has been approached with several strategies.
Another broad class of 2D/3D registration methods are based on matching image intensity. These intensity-based methods align digitally reconstructed radiographs (DRRs) \cite{unberath2018deepdrr, gao2020generalizing, gopalakrishnan2022fast} generated from CT with intra-operative fluoroscopy by optimizing image similarity measures such as correlation or information based metrics \cite{gopalakrishnan2024intraoperative}. These methods are general in the sense that they do not need outside knowledge about the content of the images, but scale poorly in the size of the images, both 2D and 3D, leading to long run times and heavy computational costs (see Table \ref{tab:intensity_vs_landmark}).

%Other learning-based approaches instead train neural networks to predict the pelvis pose directly from one or more X-ray views, often using regression architectures \cite{kendall2015posenet}. These can be made quite accurate, but require ample subject specific training data or a highly realistic DRR method.

%Also, 

%Uncertainty-aware deep learning has been widely explored to improve the reliability of predictions in medical image tasks.
Uncertainty estimation in deep learning and uncertainty-aware architectures are relatively well studied. In this paper we use one of the early approaches, Monte Carlo (MC) dropout, where dropout layers \cite{srivastava2014dropout} are kept active at test time and multiple stochastic forward passes are used to approximate epistemic uncertainty via the variance of the predictions \cite{gal2016dropout, kendall2017uncertainties}. In medical image analysis, this idea has been applied to segmentation and landmark detection to highlight regions where the network is less confident and to guide human review or post processing \cite{jungo2018effect, drevicky2020evaluating, ye2023uncertainty}.

More complex uncertainty estimators are possible, but often do not fit our use criterion for downstream weighting. Ensemble methods \cite{rahaman2021uncertainty} have been proposed as a Dropout generalization, as each Dropout iteration is often viewed as an ad hoc bootstrapped ensemble, but these require retraining and multiple network evaluations (beyond randomly sampled masks), and so they do not fit our use-case. Another family of methods is the conformal prediction \cite{shafer2008tutorial} framework, where ``conformance scores'' effectively rate datapoints as out-liers or in-liers, allowing classification or regression to split its operating characteristic curves into a geometric product. Our method is not a direct classification, and conformance scores are only related to uncertainty by quantile/order; there is no guarantee of magnitude differences being related.  Bayesian methods may also model and then sample parameter weights to form posterior distributions of both networks and outputs \cite{marinescu2020bayesian}, but these sampling methods are often slow, and again require numerous network evaluations.

%Beyond voxel-wise uncertainty, one can also conceptually use uncertainty to select or weight features for downstream tasks such as pose estimation. For example, by down-weighting or filtering out landmarks with high predictive variance or by modeling correspondences as Gaussian distributions and optimizing a probabilistic alignment objective. In the context of pose estimation, this perspective naturally motivates using uncertainty not just as a diagnostic output, but as an integral part of the registration pipeline where landmarks with high epistemic uncertainty due to occlusion, low contrast, or ambiguous anatomy can be treated as less reliable constraints, while more confident landmarks can be prioritized or exclusively used to drive pose estimation.

\begin{figure}
\centering
    \includegraphics[width=0.9\textwidth]{3_method/figures/method_ver5.png}
\caption{Overview of the uncertainty-aware pose estimation framework. Monte Carlo (MC) Dropout is used to produce uncertainty estimates (in {\color{red} Red}, below Detector) from the primary landmarking network (in {\color{blue} Blue}, Detector), which then weight landmarks during the Perspective-n-Point (PnP) optimization (in {\color{purple} Purple}). The PnP method has no learnable parameters, and is fully differentiable, allowing registration losses to be propagated directly back to the landmarking network.}
\label{fig:method}
\end{figure}