% This is samplepaper.tex, a sample chapter demonstrating the
% LLNCS macro package for Springer Computer Science proceedings;
% Version 2.21 of 2022/01/12
%
\documentclass[runningheads]{llncs}
%
\usepackage[T1]{fontenc}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{siunitx}

% T1 fonts will be used to generate the final print and online PDFs,
% so please use T1 fonts in your manuscript whenever possible.
% Other font encondings may result in incorrect characters.
%
\usepackage{graphicx}
\usepackage{subcaption}
\bibliographystyle{splncs04}
% Used for displaying a sample figure. If possible, figure files should
% be included in EPS format.
%
% If you use the hyperref package, please uncomment the following two lines
% to display URLs in blue roman font according to Springer's eBook style:
%\usepackage{color}
%\renewcommand\UrlFont{\color{blue}\rmfamily}
%\urlstyle{rm}
%
\begin{document}
%
\title{Extended Partial Angle Based Motion Compensation for Dental CBCT}
%
%\titlerunning{Abbreviated paper title}
% If the paper title is too long for the running head, you can set
% an abbreviated paper title here
%
%\author{Cristina Sarti\inst{1}\orcidID{0000-1111-2222-3333} \and
%Mikhail Mikerov \inst{1}\orcidID{1111-2222-3333-4444} \and
%Claudio Landi\inst{1}\orcidID{2222--3333-4444-5555}}
\author{Cristina Sarti\inst{1} \and
Mikhail Mikerov \inst{1} \and
Claudio Landi\inst{1}}
%
\authorrunning{C. Sarti et al.}
% First names are abbreviated in the running head.
% If there are more than two authors, 'et al.' is used.
%
\institute{See Through S.r.l, via Bolgara 2, 24060 Brusaporto BG, Italy
\email{cristina.sarti@seethrough.one}}
%
\maketitle              % typeset the header of the contribution
%
\begin{abstract}
Motion artifacts can degrade image quality in dental cone-beam CT and complicate diagnosis. In some cases, an exam retake is necessary, resulting in additional radiation exposure for the patient, without any guarantee of improved image quality. Therefore, motion compensation methods play a crucial role. Many methods are time-consuming since they require several reconstructions. We propose a very efficient method that requires only two partial-angle reconstructions. It assumes that the patient remains still during the acquisition, except for a short interval. In this situation, two motion-free partial-angle reconstructions, one before and one after patient motion, can be reconstructed. Motion compensation is achieved by registering forward projections of the two volumes. To enhance the robustness of the registration step, we simulate an extended angular range covered by the two partial volumes using a conditioned U-Net trained on a target-specific dataset. Qualitative analysis shows that we can significantly reduce the appearance of motion artifacts even in the case of challenging motion patterns. 

\keywords{Dental CBCT  \and Motion compensation \and Partial-angle reconstruction \and Angular range extension}
\end{abstract}
%
%
%
\section{Introduction}  %%% It is just a draft. I usually rewrite the draft 2 to 3 times

Dental cone-beam CT (CBCT) is a fully 3D imaging modality that provides accurate volumetric images of the oral cavity. Its applications include implant planning, detection of caries and periodontal diseases, orthodontic treatment, and exodontics \cite{SCHULZE2020}.
A typical dental CBCT device consists of a C-arm with mounted x-ray source and detector on the opposite sides, which can rotate in the horizontal plane around the patient's head. The patient can either stand or sit, although the standing position is more common. During the scan, the patient's chin is resting on a chin rest, the head is stabilized using head fixation, and the patient's arms hold a handle to minimize body movement. Additionally, the patient may be asked to bite on a bite block that is attached to the chin rest for more stability. The absence of motion is crucial for good image quality since image reconstruction algorithms rely on a well defined geometry at each acquisition step.

Despite fixation, the patient can still move during scans with a typical duration between 5 and 40 seconds. The images affected by motion artifacts are often characterized by blurry edges, undefined contours or doubled appearance of structures \cite{Moratin2020}. Due to the ALARA\footnote{As Low As Reasonably Achievable} principle, image retakes should be avoided to reduce radiation dose even at the cost of image quality. Therefore, it becomes important to be able to reduce the appearance of motion artifacts in the image domain using non-optimal projection data. 

Several solutions have been proposed in recent years to perform motion correction in dental CBCT. One class of such solutions relies on additional motion tracking hardware installed on the C-arm that enables the estimation of the motion direction and amplitude to modify the projection data \cite{spin2018ex}. These solutions are rarely used in clinical practice, since they increase device costs and complicate installation, calibration, and maintenance. 

Another class of solutions relies on extracting the necessary motion parameters from the acquired projection data and imperfect reconstructions. Sun \textit{et al.} proposed an iterative joint motion estimation and image update method \cite{sun2021motion}. Maur \textit{et al.} estimates the motion parameters by detecting, reconstructing and forward projecting the contours of anatomical features \cite{maur2019cbct}. Autofocus approaches aiming to improve some metric in the image domain, e.g., image sharpness in the bone region, have also been applied to this problem \cite{sisniega2017motion}.

Previously, we presented an approach for motion compensation based on the registration of two volumes reconstructed using only limited angular ranges \cite{Sarti2025}.We assumed that the patient remained still in an initial position, then briefly moved and remained still in a second position. We reconstructed two partial volumes corresponding to the two motion-free segments of the full scan and we added the projections affected by motion to one of the two volumes. In general, very little common information is present in both volumes since they are reconstructed using non-overlapping projection data. However, if two volumes do share common edges, it is possible to register their forward-projected data to estimate more reliably the motion parameters needed for correction. By excluding the projections affected by motion from both reconstructions can also have a positive impact on the final motion parameter estimation, especially if the motion duration exceeds one second.

In this work, we extend and optimize our previous method. We exclude the motion-affected projections from each partial reconstruction and incorporate a conditioned U-Net to modify partial-angle reconstructions to cover a larger angular range, thus increasing the amount of common features when the volumes are forward projected.


\section{Materials and Methods}
In this section, we provide details about all the steps of our motion compensation method. These steps include generation of two partial-angle reconstructions, optimization of the motion compensation parameters with 2D-3D registration and training and application of a conditioned U-Net for volume modification in the image domain.

\subsection{Geometric calibration}

During installation, a CBCT device is usually calibrated with an offline calibration process. A geometric calibration using specially designed phantoms is performed to calculate so-called \textit{projection matrices} corresponding to each detector position $i$ of a total of $N$ positions assumed by the C-arm during the X-ray scan. They contain the parameters describing the projection geometry of the 3D patient head onto the 2D detector. Each matrix $P_i \in \mathbb{R}^{3 \times 4}$ can be written as a product of two matrices: 
\begin{equation}
    P_i = K \cdot G_i, \quad i=0, \ldots,N.
    \label{eq:proj_matrix}
\end{equation}
The matrix $G_i=[R_i|t_i] \in \mathbb{R}^{4 \times 4}$ contains the extrinsic parameters, i.e., the parameters describing the relative translation and rotation of the X-ray source with respect to the patient's position. The matrix $K \in \mathbb{R}^{3 \times 4}$ contains the intrinsic parameters of the projections such as the source-detector-distance, the detector origin and the pixel size \cite{hartley2004multiple}. 

Patient motion during acquisition can partially or entirely invalidate the parameters estimated during calibration. The goal of motion compensation is to restore a correct geometry and to output an artifact-free reconstruction. To reach this goal, our method calculates a transformation matrix  ${G_i}^{\prime}=[{R_i}^{\prime}|{t_i}^{\prime}] \in \mathbb{R}^{4 \times 4}$ for each device position $i$ such that
\begin{equation}
    {P_i}^\prime = K \cdot G_i \cdot {G_i}^{\prime}, \quad i=0, \ldots,N.
    \label{eq:proj_matrix_comp}
\end{equation}
%
The matrix ${G_i}^{\prime}$ contains the rotation and translation parameters that best compensate for patient motion.


\subsection{Partial-angle reconstruction}

Similarly to our perviously proposed method \cite{Sarti2025}, we perform motion compensation by partial-angle volume registration. With \textit{partial-angle reconstruction} we describe a reconstruction that only includes a small subset of the full set of projections collected during a scan. Figure \ref{fig:full_partial} compares a complete reconstruction and a partial-angle reconstruction covering $60$ degrees. Both volumes are reconstructed using the Feldkamp-Devis-Kress (FDK) algorithm \cite{feldkamp1984practical}. The partial-angle reconstruction shows artifacts due to the fact that only the information contained in a small subset of the projections is reconstructed.
\begin{figure}%[htbp]
\centering
\includegraphics[width=0.8\textwidth]{Figures/comparison.pdf}
    \caption{Complete reconstruction and partial-angle reconstruction ($60$ degrees).}
    \label{fig:full_partial}
\end{figure}

In our approach, we assume that the patient remains still during the first portion of the scan, then moves and remains still in a new position during the final part of the scan. The patient motion is detected separately and is not a part of reported method. In our previous method, we registered the two partial-angle volumes assuming that the portion of the scan affected by motion was small compared to the one or the other partial reconstruction. So, we included the projections affected by motion in one or in the other reconstruction. The method turned out to be very robust for abrupt motion covering $0.5$ seconds but reliability decreased for motion exceeding $1$ second.

To be able to compensate for longer motion, we propose a new strategy and modify some of the steps to generate the reference partial-angle reconstructions and to perform registration: (1) The projections affected by motion are no longer included in one of the partial reconstructions. This means that two partial-angle reconstructions are obtained using only motion-free projection data acquired before and after patient motion. The angular gap between the two partial reconstruction corresponds to the motion extent. (2) In order to extend the angular range covered by each partial volume, we apply a suitably trained U-Net. We simulate the addition of projections within the angular gap to each of the two original partial volumes. This leads to a higher number of common structures and stabilizes the subsequent 2D-3D registration step.


\subsection{Conditioned U-Net}
U-Nets are widely used for image-to-image translations in medical imaging. We apply a U-Net architecture to transform partial-angle reconstructions obtained from data acquired over a limited angular range $\Delta \theta$, so that they approximate reconstructions covering a slightly extended range $\Delta \theta + \Delta \phi$. In general, the appearance of the partial volumes will vary depending on whether the additional angular range $\Delta \phi$ is added before the first or after the last angle in the initial reconstruction. We control the output of the network by defining the side to which the additional angular range should be added.

\subsubsection{Architecture}
For this task, we added a control mechanism in form of simple Feature-wise Linear Modulation (simple FiLM) layers to the encoder block of a standard 3D U-Net working on 3D patches \cite{Brocal2019}. The simple FiLM layer is an additional fully connected layer in each double convolution block that receives an indicator of extension direction as either 0 or 1 and outputs two scalars, the multiplicative factor $\gamma$ and the offset $\beta$. They are used to modify the output of the block. The network's architecture is displayed in Figure \ref{fig: architecture}.

\begin{figure}[t]
\includegraphics[width=\textwidth]{Figures/network.pdf}
\caption{Architecture of a 3D U-Net with simple FiLM modulation layers. Black arrows represent skip connections; dashed rectangles represent data concatenation. Patch size and number of channels are displayed above each patch.}
\label{fig: architecture}
\end{figure}

\begin{figure}
    \begin{subfigure}[t]{0.48\textwidth}
        \centering
        \includegraphics[width=\textwidth]{Figures/input_patient_distribution_abstract.pdf}
    \end{subfigure}
    \hfill
    \begin{subfigure}[t]{0.48\textwidth}
        \centering
        \includegraphics[width=\textwidth]{Figures/input_starting_angle_distribution_abstract.pdf}
    \end{subfigure}
    \caption{Left: number of input images in the dataset per patient. Right: number of input images in the dataset per starting projection number in the partial-angle reconstruction.}
    \label{fig: dataset}
\end{figure}

\subsubsection{Dataset preparation}
Dental cone-beam CT data is characterized by high variability. For example, some patients may have implants, crowns, or other metal inserts which can lead to significant image artifacts even without motion. Moreover, missing teeth are a common pathology. The operator can influence the image, too, by instructing the patient to bite on a bite block, thereby introducing an additional visible structure into the scan. Finally, another source of variability is the time point at which the patient moves and the motion extent. We constructed our dataset to account for as much variability as possible. 

We started by selecting seven highly variable patients. Then, we randomly sampled the starting projection number of a partial-angle reconstruction from a uniform distribution of possible projection numbers. If the projection number was too close to an already added projection number in the dataset, it was not included. Finally, for each angle, we randomly assigned one of the seven patients. Figure \ref{fig: dataset} displays the distribution of the input volumes corresponding to the selected patients and projection numbers in the dataset. This randomized approach allows us to to add sought variability to the dataset without making it very large as it would have been the case if the the data from all patients was reconstructed at the same positions. The same reconstruction is included twice in the dataset if it can be extended in both directions. Otherwise, it appears in the dataset only once. For example, if the starting projection number is 10, only 9 projections can be added before the first projection. However, since our decoder is trained to add additional angular range corresponding to 40 projections, we do not have matching ground truth for this case.

\subsubsection{Training}
The conditioned U-Net was trained using PyTorch's automatic mixed precision package with early stopping regularization for 13 epochs on $32 \times 32 \times 32$ patches with stride 16 in all three directions. The patches were extracted from $316 \times 316 \times 344$ volumes with \SI{320}{\micro\metre} voxel size. The training dataset contained reconstructions from the first six patients. The learning rate of the Adam optimizer was set to 0.0001. No learning scheduler was used. We used MSE loss as a cost function for this task. The data of the seventh patient were used for validation.

\subsection{Partial-angle reconstruction registration}

After modification using the U-Net, the two partial-angle volumes share common structures corresponding to the data within the angular gap. The volumes are normalized and binarized by applying an experimentally estimated threshold. This operation enables the extraction of hard tissue structures like bones and teeth and the removal of soft tissues and possible artifacts that could affect the subsequent registration process. The binarized volumes are forward projected at $M$ sampling positions within the gap. For each position $j$, we have pair of projections  $(r_j, r_j')$ defined as
\begin{align}
 r_j=P_j\cdot V_I,\quad
 {r_j}^{\prime}=P_j\cdot{G_j}^{\prime} \cdot V_{II}, \quad j= 1,\ldots, M.
\end{align}
The volumes $V_I$ and $V_{II}$ are the partial volumes. The matrix $P_j$ is the projection matrix that corresponds to the acquisition position $j$. The compensation matrix ${G_j}^{\prime}$ is defined in Equation (\ref{eq:proj_matrix_comp}).

The process of retrieving the parameters of ${G_j}^{\prime}$ is formulated as a minimization problem. Due to the head resting on a chin rest acting as hinge support, large translational movements are unlikely \cite{hernandez2018}. Moreover, since the effect of small translations was found to be well approximated by small rotations we limit our estimation to the rotation parameters $(rX, rY, rZ)$. As explained in \cite{Sarti2025} we start the minimization process by extracting hard tissue edges on the forward-projected images. We apply a gradient operator $\vec{\nabla}=({\nabla}_x, {\nabla}_y)$ and generate corresponding gradient images $(\vec{\nabla} r_j, \vec{\nabla} {r_j}^{\prime})$. We define the cost function $\Phi(\vec{\nabla} r_j, \vec{\nabla} r_j')$ as explained in \cite{Sarti2025} and for each sampling position $j$ we retrieve the entries of the matrix ${G_j}^{\prime}$ repeating the 2D-3D registration step until the function $\Phi$ reaches a minimum.

The final matrix used to align $V_I$ and $V_{II}$ is the average of the matrices estimated for each position $j =1, \ldots, M$. To correct the motion-affected projections within the angular gap  and to ensure a smooth transition between the first and the second part of the final volume, we linearly interpolate the values from 0 to $(rX, rY, rZ)$ across the angular gap.

\section{Data}

We tested our method with real patient data acquired using the CBCT device Seethrough Max (See Through s.r.l, Brusaporto, Italy). The acquisitions lasted $14$ seconds and covered an angle slightly larger than $180$ degrees. The size of the reconstructed field of view is $10  \times 11$ centimeters.
Since the collected data did not present motion artifacts, we simulated different motion patterns (nodding, tilting, axial rotation, translation) and durations modifying the projection matrices at some angular points. We used the strategy described in \cite{abdul2023motion}. We particularly focused on movements of about $1.5-2.0$ seconds duration for which our previously implemented method did not perform reliably. Furthermore, we focused on movements that occurred either at the very beginning or the very end of the acquisition, as these are especially challenging to compensate for. In the next section, we present exemplary results for three cases: (1) nodding or axial rotation with translation (1.5s); (2) combined nodding, tilting, axial rotation, and translation (1.5s); (3) nodding only (2s). A comparison with our previous method \cite{Sarti2025} is also provided.

\section{Results} 
On the left, Figure \ref{fig:case1_2} shows the results of our motion compensation method for Case 1. Strong motion as a combination of a single rotation (nodding, axial rotation) and a translation was simulated. In both cases artifacts are almost fully corrected or strongly attenuated even in the case that the patient has metal inserts. The same can be observed for more complicated types of motion resulting from a combination of rotations (nodding, tilting, axial rotation) and a translation (Case 2). Also in this case, artifacts could be almost completely corrected or significantly attenuated, as shown in Figure \ref{fig:case1_2} on the right. Finally, we compared our current approach with our previous method. Figure \ref{fig:case3} shows that the current method also performs very well in the case of a longer nodding motion of about 2 seconds (Case 3).

%\begin{figure}
%    \centering
 %   \includegraphics[width=0.5\linewidth]{Figures/case1.pdf}
%    \caption{Case 1.}
%    \label{fig:case1}
%\end{figure}
%
%\begin{figure}
%    \centering
%    \includegraphics[width=0.5\linewidth]{Figures/case2.pdf}
%    \caption{Case 2.}
%    \label{fig:case2}
%\end{figure}
%
\begin{figure}
    \begin{subfigure}[t]{0.5\textwidth}
        \centering
        \includegraphics[width=\textwidth]{Figures/case1.pdf}
    \end{subfigure}
    \hfill
    \begin{subfigure}[t]{0.5\textwidth}
        \centering
        \includegraphics[width=\textwidth]{Figures/case2.pdf}
    \end{subfigure}
    \caption{Motion-affected, compensated, and ground truth axial slices for cases corresponding to translational motion combined with rotation around a single axis (left) and rotation around two axes (right). The motion duration was 1.5 seconds.}
    \label{fig:case1_2}
\end{figure}
\begin{figure}%[htbp]
\centering
\includegraphics[width=0.85\textwidth]{Figures/rx3.pdf}
    \caption{Comparison of the current method with our previous method on a case of nodding for 2 seconds.}
    \label{fig:case3}
\end{figure}

\section{Discussion and Conclusion}

In this paper, we present a robust optimized method to compensate for motion in dental CBCT images. The current method preserves the advantages of the previous method \cite{Sarti2025}.
It is robust and fast, since only forward-projected data are registered and only two partial-angle volumes are reconstructed.
Very few projections, taken at predefined sampling angles within the angular gap, are necessary for the 2D-3D registration process to recover the motion compensation parameters. The novelty of the current approach relies on a combination of our previous strategy with a deep learning approach. With a suitably trained conditioned U-Net, we could modify each partial-angle volume to cover a larger angular range. In this way, we were able to compensate for longer and more complex motion patterns. Addition of simple FiLM layers to U-Net architecture reduces the number of U-Nets needed for inference. Theoretically, the simple FiLM layers can be  inserted in both the encoder and decoder. We did not find any advantage of adding them to the decoder block. We believe that it is more important to create different deep representations for an input depending on the side on which additional angular range should be added, rather than modifying the decoding of the same deep representation. 

Our current results are very promising and suggest the possibility of using the optimized method to compensate for motions longer than two seconds. It is also conceivable to apply the method to compensate for artifacts due to multiple motion patterns. Multiple partial-angle volumes could be reconstructed and iteratively aligned using the motion compensation approach. Once motion artifacts are sufficiently compensated, the diagnostic quality of the final fully reconstructed volume may be restored. Finally, additional experiments are necessary involving various patient pathologies and various types of motion to determine whether some data types and artifacts are more challenging to compensate for than others.

\begin{credits}
\subsubsection{Acknowledgments}
The authors thank Lorenzo Arici, Andrea Delmiglio, Luca Fracassetti, and Ivan Tomba for their valuable contributions to the implementation of the reconstruction library.
\subsubsection{\discintname}
C. Sarti, M. Mikerov, and C. Landi are employed by See Through s.r.l.
\end{credits}
%
% ---- Bibliography ----
\bibliographystyle{splncs04}  % it is a very weird style -> the references are sorted alphabeticaly. I checked old MICCAI submissions and it is indeed the case there
\bibliography{bibliography}

\end{document}
