\section{Optimizing Speed in Sliding Window Inference}
\label{appx:speed}

Since inference time is also an evaluation metric in the STSR 2025 challenge, we explore optimizing the parameters of the sliding window inference technique to improve inference speed without significantly deteriorating model performance (\ie the average score of DSC, NSD, mIoU, and IA). \cref{fig:speed_tradeoff} illustrates the tradeoff between the model performance and inference time for different tile sizes and mirror axes combinations in test-time augmentation (TTA). As most of the voxels in the CBCT image belong to the background class, setting the tile size to 0.9 substantially reduces the inference time by 53\% with a negligible drop of only 0.002 average score. Furthermore, \cref{fig:speed_tradeoff}  demonstrates that although mirroring in all axes leads to the best performance, it comes with the downside of long inference time.
The optimal mirror axes combination is `1,2', offering a good average score with an inference time of only 17.08 seconds. 

\begin{figure}[htb]
    \begin{subfigure}{0.52\textwidth}
        \centering 
        \includegraphics[width=\linewidth]{fig/ts_abl_sts.png}
    \end{subfigure} \hfill
    \begin{subfigure}{0.47\textwidth}
        \centering 
        \includegraphics[width=\linewidth]{fig/mirror_abl_sts.png}
    \end{subfigure}
    \caption{(Left): Effect of the tile size on the metrics with `1,2' mirror axes in TTA. (Right): Effect of various mirror axes combinations in TTA on the metrics when tile size is set to 0.9. Axis definition: `0' is superior/inferior, `1' is anterior/posterior, and `2' is left/right.}
    \label{fig:speed_tradeoff}
\end{figure}
