\section{Introduction}
\label{sec:intro}

\begin{figure}[t]
  \centering
  \includegraphics[width=0.92\linewidth]{Images/overview.png}
  \caption{Overview of TF-PRDiT. A frozen voxel-level 3D Diffusion Transformer prior is guided at inference by measurement-consistency gradients from a task-specific forward operator $\mathcal{A}$. By replacing $\mathcal{A}$, the same frozen model handles X-ray-to-CT reconstruction and volumetric restoration tasks without retraining.}
  \label{fig:overview}
  % \vspace{-12pt}
\end{figure}

Recovering 3D medical volumes from sparse, incomplete, or degraded measurements is a central inverse problem in computational radiology. Fully sampled CT provides rich anatomical information, but its acquisition is constrained by scanner availability, clinical cost, specialized equipment, and radiation exposure. In contrast, 2D X-rays are widely available and low dose, yet mapping one or a few projections to a full 3D volume is severely ill posed. 
A useful reconstruction framework should therefore enforce the available measurements while preserving anatomically plausible 3D structure. Most existing X-ray-to-CT methods learn a supervised mapping for a fixed measurement setting~\cite{jin2017deep,ying2019x2ct,kyung2023perspective,liu2024diffux2ct,jeong2025dx2ct}. These models can work well under the view configuration used during training, but changing the geometry or number of views usually requires architectural changes or retraining. This rigidity is especially limiting in clinical workflows, where imaging protocols vary and paired 3D training data are expensive.

Training-free diffusion-based inverse solvers offer a different route: a pretrained generative prior is kept fixed, and measurement consistency is imposed during sampling through a known forward operator~\cite{kawar2022denoising,song2021solving,chung2022diffusion,wang2022zero}. This posterior-sampling view, notably formalized in Diffusion Posterior Sampling (DPS)~\cite{chung2022diffusion}, allows the same prior to be reused across tasks without retraining. However, most prior work has focused on 2D images or relatively simple restoration operators. Extending this principle to native 3D medical volumes and projection-to-volume reconstruction remains challenging because volumetric generation is computationally costly, sparse X-ray measurements are severely underdetermined, and latent compression can remove fine anatomical details.

In this paper, we present TF-PRDiT, a \textbf{t}raining-\textbf{f}ree extension of \textbf{p}ixel-level \textbf{r}esidual \textbf{di}ffusion \textbf{t}ransformer for conditional sampling at test time to solve medical inverse problems. Our method follows the DPS-style idea of enforcing measurements during sampling rather than retraining the generative model; our contribution is to adapt this idea to native 3D CT priors and differentiable medical forward models. For X-ray-to-CT, the forward operator is a differentiable DRR projector that maps a sampled 3D volume to one or more 2D projections. For other inverse problems, it can be replaced by a downsampling, masking, or blurring operator. This formulation makes the number of available X-rays a runtime input, rather than a training-time architectural assumption.

Our contributions are: (1) a training-free sampler extending DPS to native voxel-space 3D CT with differentiable volumetric forward operators, (2) denoised-prediction guidance with a cosine-decay schedule and $k$-step predictor–corrector for stable high-dimensional conditioning, and (3) a single frozen prior that scales to 1–12 X-ray views and transfers to super-resolution, infilling, and deblurring by swapping only $\mathcal{A}$.
