\section{Related Work}
\label{sec:rel}

\noindent\textbf{Sparse-view CT reconstruction.}
Learning 3D CT from sparse X-rays observations is commonly formulated as a supervised image-to-image translation problem. X2CT-GAN~\cite{ying2019x2ct} learns to fuse 2D X-ray features into a 3D generator, while PerX2CT~\cite{kyung2023perspective} further incorporate perspective projection geometry. More recent diffusion and Transformer-based approaches, such as DiffuX2CT~\cite{liu2024diffux2ct} and DX2CT~\cite{jeong2025dx2ct}, improve reconstruction quality by learning stronger conditional mapping from fixed input views to CT volumes. However, these methods are tied to the view configuration used during training and cannot handle variable views without retraining.  NAF~\cite{zha2022naf} is conceptually related, optimizing a patient-specific neural representation from projections, but assumes sparse-view CBCT with $\sim$50 views in their main experiments. Our mono-/bi-planar setting (1–2 views) is not measurement-budget aligned, so we omit direct comparison. 
Our TF-PRDiT instead treats each X-ray as a runtime measurement constraint while using a pretrained 3D diffusion prior to recover anatomically plausible structures.

\noindent\textbf{Diffusion inverse solvers.}
Score-based inverse problem solvers and Diffusion Posterior Sampling (DPS)~\cite{song2021solving,chung2022diffusion} show pretrained diffusion models can be reused as generative priors by enforcing measurement consistency during sampling. Related training-free methods include DDRM~\cite{kawar2022denoising}, which leverages the singular-value structure of linear degradation operators, DDNM~\cite{wang2022zero}, which decomposes the solution into range- and null-space components, and FreeDOM~\cite{yu2023freedom}, which extends test-time guidance to non-linear operators. These methods are mainly developed for 2D image restoration, where forward operators are lower-dimensional and often linear. Sparse-view X-ray-to-CT is more challenging as the operator maps a high-dimensional 3D volume to 2D projections and becomes increasingly underconstrained as views decrease. TF-PRDiT follows the same posterior-sampling principle but instantiates it for native voxel-space 3D CT, where each additional X-ray view contributes a residual term to the guidance objective.

\noindent\textbf{Native 3D priors.}
Direct 3D diffusion is expensive because memory and compute scale cubically with resolution. Many 3D generative models rely on latent compression or encoder-decoder architectures, which may weaken fine anatomical boundaries. TF-PRDiT builds on PRDiT~\cite{zhang2026pixellevel}, a voxel-level residual diffusion Transformer prior, and combines it with differentiable measurement guidance to enable training-free conditional sampling across multiple inverse problems.
