\section{Introduction}
\label{Introduction}

% 2D/3D registration is an optimization task that aligns a 2D image with a 3D volume in a common spatial reference frame \cite{grupp2018patch, unberath2021impact}. In typical workflows, an X-ray or fluorosocpy image is registered to a CT scan allowing 3D anatomical information to compensate for the loss of depth and perspective in 2D projections. This paradigm is widely used in image-guided orthopaedic, spine, trauma, and vascular procedures that require precise localization of anatomy, tools, and implants, where fast but depth-poor intra-operative fluoroscopy is complemented by registration to restore 3D context using standard operating room equipments \cite{cho2023visualization}.

% The pelvis is a clinically important yet technically challenging target for 2D/3D registration. It is a large, irregular structure with substantial inter-patient variability, and limited field-of-view, overlapping anatomy, and low contrast in fluorscopic images further hinder direct estimation of its 3D pose \cite{grupp2019pose}. By aligning a patient's pre-operative pelvic CT with intra-operative fluroscopy, 2D/3D pelvis registration establishes a consisten 3D reference frame for navigation and downstream quantitative measurement.

% The goal of 2D/3D pelvis registration is to estimate the rigid-body pose $\theta\in\mathrm{SE}(3)$ that aligns the patient's 3D anatomy with the intra-operative fluoroscopy image. This is formulated as the solution to the optimization problem:
% \begin{align}
%     \theta^*=\arg\min_{\theta\in \mathrm{SE}(3)} \mathcal{S} \big( P(\theta, V), I \big)
%     \label{eq:goal}
% \end{align}
% % where $V$ is the 3D CT pelvis volume or 3D landmarks, $I$ denotes the corresponding intra-operative image or 2D landmarks, $P(\theta,V)$ is the projection operator that generates a digitally reconstructed radiograph (DRR)  or projected landmarks at pose $\theta$, and $\mathcal{S}$ is a similarity metric that quantifies alignment between $P(\theta, V)$ and $I$. The specific choices of $V$, $I$, and $\mathcal{S}$ determine whether the registration is intensity-based or landmark-based.
% where $V$ represents the 3D modality (e.g., CT volume or 3D landmark set), $I$ denotes the 2D modality (e.g., fluoroscopy pixel intensities or 2D landmark coordinates), and $P(\theta, \cdot)$ is the projection operator that maps the 3D data to the 2D domain at pose $\theta$. The function $\mathcal{S}$ is a similarity metric quantifying the alignment between the projected source $P(\theta, V)$ and the target $I$.

% Various methods have been proposed to solve this registration problem, including intensity-based methods that optimize similarity metrics $\mathcal{S}$ between images \cite{gopalakrishnan2024intraoperative}, deep learning approaches that predict the optimal camera pose $\theta^*$ directly from images \cite{kendall2015posenet}, and landmark-based methods that use geometric correspondences between anatomical points \cite{grupp2020automatic}. While these approaches have shown promise, each has significant limitations. 

% Intensity-based registration methods require guidance from the entire image, which can be unnecessary and computationally expensive. Rather than evaluating thousands or millions of pixels across the entire image, it is more efficient to focus on a small set of pixels that are truly informative for this task \cite{suh20252d}. Moreover, deep learning-based methods, which directly regress pose from X-ray images offer rapid inference. However, they generally operate as black-box models and is known to be vulnerable in the prediction of the out-of-plane translation, making it difficult to interpret or explain their predictions \cite{suh2025better}. Lastly, landmark-based methods often rely on anatomical landmarks that are manually selected and annotated on pre-operative 3D images \cite{bier2018x}. Although these landmarks are anatomically meaningful, it is unclear whether all of these landmarks provide sufficient information to optimally estimate pelvic pose. This raises an important question: ``can we systematically filter out a set of 2D points that are actually robust and accurate for pose estimation?''

% In this work, we propose a landmark-based 2D/3D pelvis registration framework that incorporates per-landmark uncertainty into the pose estimation process. Instead of assuming that all automatically detected landmarks are equally trustworthy, our approach assigns each landmark a reliability score and uses these scores to decide which landmarks should actually influence the pose estimation. This allows us to ask, in a controlled way, how registration behaves when we deliberatley suppress the most uncertain landmarks compared to using the full set. We then experimentally analyze how these uncertainty-informed subsets affect both 2D landmark accuracy and 3D pelvic pose errors on held-out patient's fluoroscopy images that were not used for training to show the resulting patterns.
% In this work, we propose a landmark-based 2D/3D pelvis registration framework that selects the landmarks that will be used for the registration based on the estimated reliability score of each landmarks. Rather than treating all detected landmarks as equally trustworthy, the method estimates per-landmark uncertainty via Monte Carlo (MC) dropout in a U-Net and discards the most unreliable 2D points before solving for the 3D pelvic pose with a perspective projection-based non-linear optimizer. On a public pelvic CT and fluoroscopy dataset under a patient-held-out evaluation, this uncertainty-aware selection is shown to reduce 2D landmark detection error and improve 3D pose accuracy while retaining the computational efficiency and geometric interpretability of landmark-based registration.
% Registration of 2D images to 3D volumetric imaging has many clinical and intra-operative uses. 2D/3D methods can take surgical plans from pre-operative volumetric imaging to the table-side context, or provide depth and perspective to modalities like fluoroscopy which are fast and mobile but lack contrast and have a single fixed perspective \cite{cho2023visualization}. The 2D/3D paradigm is often proposed as a solution for image-guided procedures across a variety of specialties, including orthopedic surgery, vascular procedures, spine surgery, and some trauma and reconstructive surgery.

2D/3D registration is an optimization task that aligns a 2D image with a 3D volume in a common spatial reference frame~\cite{grupp2018patch,unberath2021impact}. In typical workflows, an X-ray or fluoroscopy image is registered to a CT scan so that 3D anatomical information compensates for the loss of depth and perspective in 2D projections. This paradigm is widely used in image-guided orthopaedic, spine, trauma, and vascular procedures that require precise localization of anatomy, tools, and implants, where fast but depth-poor intra-operative fluoroscopy is complemented by registration to restore 3D context using standard operating room equipment~\cite{cho2023visualization}.

Current methodology is divided into two broad classes of methods, image intensity matching \cite{unberath2018deepdrr, gao2020generalizing, gopalakrishnan2022fast} and landmark matching \cite{gao2003complete,lepetit2009ep,li2012robust}. The former uses a forward model of projection and iteratively updates beliefs about the detectors relative pose by matching that projection to the observed images. While this has the potential to have high accuracy and generality across anatomy, each forward pass is generally computationally expensive and thus often slow for bed-side applications. Landmark methods instead rely on anatomic knowledge of the target volume, and match pre-defined features or landmarks between the 2D and 3D sets. While this is much more computationally tractable, avoiding the reprojection steps of intensity matching, it is prone to higher errors due to sensitivity in point matching and the intrinsic variability of anatomical landmarks.

%The pelvis is a clinically important yet technically challenging target for 2D/3D registration. It is a large, irregular structure with substantial inter-patient variability, and limited field-of-view, overlapping anatomy, and low contrast in fluoroscopic images further hinder accurate estimation of its 3D pose~\cite{grupp2019pose}. By aligning a patient's pre-operative pelvic CT with intra-operative fluoroscopy, 2D/3D pelvis registration establishes a consistent 3D reference frame for navigation and downstream quantitative measurement.

% The goal of 2D/3D pelvis registration is to estimate the rigid-body pose $\theta \in \mathrm{SE}(3)$ that aligns the patient's 3D anatomy with the intra-operative fluoroscopy image. This goal is formulated as the solution to the optimization problem
% \begin{align}
%     \theta^* = \arg\min_{\theta \in \mathrm{SE}(3)} \,
%     \mathcal{S}\big( P(\theta, V), I \big),
%     \label{eq:goal}
% \end{align}
% where $V$ represents the 3D modality (e.g., CT volume or 3D landmark set), $I$ denotes the 2D modality (e.g., fluoroscopy or 2D landmark coordinates), and $P(\theta, \cdot)$ is the projection operator that maps the 3D data to the 2D domain at pose $\theta$. The function $\mathcal{S}$ is a similarity metric quantifying the alignment between the projected source $P(\theta, V)$ and the target $I$.

%The registration of intra-operative imaging to a pre-operative scan or intersubject template is


% Typical workflows take an X-ray or fluoroscopy image and register it to a CT scan, either to use the 3D anatomical information to compensate for the loss of depth and perspective in 2D projections. This paradigm is widely used in image-guided orthopaedic, spine, trauma, and vascular procedures that require precise localization of anatomy, tools, and implants, where fast but depth-poor intra-operative fluoroscopy is complemented by registration to restore 3D context using standard operating room equipment~\cite{cho2023visualization}.



% A variety of methods have been proposed to instantiate Eq.~\eqref{eq:goal}, including intensity-based methods that optimize similarity metrics $\mathcal{S}$ between images, deep learning approaches that regress the optimal camera pose $\theta^*$ directly from images, and landmark-based methods that use geometric correspondences between anatomical points~\cite{gopalakrishnan2024intraoperative,kendall2015posenet,grupp2020automatic}. Intensity-based registration exploits information from the entire image but can be computationally expensive and may rely on structures that are not informative for pose estimation. Direct pose regression networks offer rapid inference but typically behave as black-box predictors and have known weaknesses for certain degrees of freedom such as out-of-plane translation~\cite{suh2025better}. Landmark-based methods provide an interpretable geometric formulation but often assume that all predicted landmarks are equally reliable, even though some landmarks may be occluded, ambiguous, or poorly localized~\cite{bier2018x,suh20252d}.

% To address this, we propose an uncertainty-aware framework that explicitly models the reliability of each anatomical point using MC dropout. Unlike standard approaches that rely on fixed point estimates, our method estimates epistemic uncertainty to identify unreliable predictions caused by occlusion or ambiguous anatomy. We leverage this uncertainty in a dual-stage strategy: first, during training, we employ a weighted loss function that prioritizes high-confidence landmarks to stabilize fine-tuning; second, at test time, we apply a dynamic filtering mechanism to discard the most uncertain landmarks before optimizing the pose. We demonstrate that this approach significantly reduces registration error by effectively isolating and rejecting the outliers that typically disrupt rigid-body alignment.

% We propose an uncertainty-aware framework that explicitly models the reliability of each anatomical point using MC dropout to instantiate Eq.~\eqref{eq:goal}. Unlike standard approaches that rely on fixed point estimates, our method estimates epistemic uncertainty to identify unreliable predictions caused by occlusion or ambiguous anatomy. We leverage this uncertainty in a dual-stage strategy: first, during training, we employ a weighted loss function that prioritizes high-confidence landmarks to stabilize fine-tuning; second, at test time, we apply a dynamic filtering mechanism to discard the most uncertain landmarks before optimizing the pose. We demonstrate that this approach significantly reduces registration error by effectively isolating and rejecting the outliers that typically disrupt rigid-body alignment.

In the present work we propose an uncertainty-aware framework that models the reliability of each anatomical point during the landmark identification phase, and includes that estimate as an optimization weight in the subsequent pose estimation phase. By integrating per-landmark uncertainty into a fully differentiable landmark detection and Perspective-n-Point (PnP) pipeline, our method stabilizes pose estimation by increasing the influence of trustworthy keypoints and suppressing unreliable ones. Our contributions are threefold: (1) we introduce a differentiable uncertainty-to-weight formulation that enables continuous weighting during training and inference of Landmark-PnP pose estimation schemes; (2) we show that our selection strategies improve robustness even without requiring retraining; and (3) we provide empirical evidence that uncertainty estimates landmark reliability, yielding substantially improved 2D/3D pelvis registration performance.