\section{Methods: Conditional learned iterative schemes}
\label{sec:sec3}



In order to efficiently adapt an arbitrary learned iterative reconstruction scheme to the current physical acquisition parameters, we propose to \textit{modulate} the convolutional filter intensities by a learned function of the acquisition parameters. To that end, we introduce the \textit{Modulated Convolutional} layer in \Section{sec3.1} which is used in place of the traditional convolutional layer employed in iterative reconstruction schemes. 

Our motivation for modulation is based on the discussion about the optimal choice of regularization parameters in Section \ref{sec:sec2.1}, and in particular the Morozov discrepancy principle. We note that when working with a heterogeneous dataset with varying noise characteristics, such as tube current in tomography or field strength and acceleration parameter in MRI, the optimal amount of regularization would likely vary as well. However, if the learned iterative scheme is not informed about such variation, it would be forced to try to estimate noise characteristics from the image data directly, which can complicate the learning process leading to sub-optimal results and poor generalization to other acquisition schemes.

\subsection{Modulated Convolution}
\input{figs/modulator}
% \input{figs/modocnv}
\label{sec:sec3.1}
% \subsubsection{Notation}
\begin{itemize}[label=$\bullet$,leftmargin=*]
    % \item Let $\vec{i} \in \mathbb{R}^{n \times C_{\text{in}}}$ represent an input image with spatial dimensions $n = n_1 \times n_2$ in 2D or$n = n_1 \times n_2 \times n_3$ in 3D, and $C_{\text{in}}$ as the number of input channels.
    % \item Let $\vec{k} \in \mathbb{R}^{k \times C_{\text{out}} \times C_{\text{in}}}$ denote a convolutional kernel, where $k = k_1 \times k_2$ (2D) or $k = k_1 \times k_2 \times k_3$ (3D) are the kernel's dimensions, and $C_{\text{out}}$ represents the number of output channels.
    % \item Let $\vec{o} \in \mathbb{R}^{n' \times C_{\text{out}}}$ be the output of the convolution process, with $n'$ analogous to $n$.
    \item Let $\vec{i} \in \mathbb{R}^{n \times C_{\text{in}}}$ denote an input image with spatial support 
    $n = n_1 \times n_2$ in 2D or $n = n_1 \times n_2 \times n_3$ in 3D, and $C_{\text{in}}$ the number of input channels.
    
    \item Let $\vec{k} \in \mathbb{R}^{k \times C_{\text{out}} \times C_{\text{in}}}$ represent a convolutional kernel, where 
    $k = k_1 \times k_2$ in 2D or $k = k_1 \times k_2 \times k_3$ in 3D, and $C_{\text{out}}$ the number of output channels.
    
    \item Let $\vec{o} \in \mathbb{R}^{n' \times C_{\text{out}}}$ be the resulting output image, with $n'$ defined analogously to $n$.
\end{itemize}

\noindent
Additionally, let $\vec{z} \in \mathbb{R}^{N}$ denote an auxiliary vector containing the acquisition parameters. Utilizing the aforementioned notations, we define the modulated convolution as:
% 
{
\small
\begin{equation}
    \vec{o}_{m} = \sum_{k=0}^{C_{\text{in}}-1} \left((\vec{W}_{\boldsymbol{\theta}})_{m,k} \cdot \vec{k}_{m,k}\right) \star \vec{i}_{k}   + (\vec{b}_{\boldsymbol{\psi}})_{m}, \quad \vec{W}_{\boldsymbol{\theta}} = f_{\boldsymbol{\theta}}(\vec{z}) \in \mathbb{R}^{C_{\text{out}}\times C_{\text{in}}}, \quad \vec{b}_{\boldsymbol{\psi}} = g_{\boldsymbol{\psi}}(\vec{z}) \in \mathbb{R}^{C_{\text{out}}}.
    \label{eq:mod_conv}
\end{equation}
}
% 
\noindent
where $\cdot$ denotes scalar multiplication and $\star$ represents the cross-correlation operation, $m=1,\cdots, C_{\text{out}}$, and  $f_{\boldsymbol{\theta}}$ and $g_{\boldsymbol{\psi}}$ refer to the components of the modulation model, implemented as (trainable) multi-layer perceptrons (MLPs). These MLPs take the auxiliary vector $\vec{z}$ as input and produce a modulating weight that adjusts the convolutional kernel and a bias tailored to the acquisition parameters. Consequently, the convolutional process becomes conditioned on the auxiliary variable, enhancing its adaptability. Note that the conventional (\textit{unmodulated}) convolution can be obtained by setting $ f_{\boldsymbol{\theta}} = \mathbf{1} \in \mathbb{R}^{C_{\text{out}}\times C_{\text{in}}} $ and $g_{\boldsymbol{\psi}} = \boldsymbol{\psi} \in \mathbb{R}^{C_{\text{out}}}$. Each MLP is structured with linear layers followed by parametric ReLU (PReLU) activation functions, and a Softplus activation is applied after the final layer. 

In \Fig{diagram} we provide a depiction of a modulated convolution while the architecture of the modulation model is illustrated in \Fig{modulation}. A more generalized form of our proposed conditioning method, for different types of modulation, is detailed in \Appendix{appendix2-genmodconv}. 
% However, this main paper focuses on the method presented in \Section{sec3.1}.

\subsection{Modulated Transposed Convolution}

Many convolutional-based models, particularly those with an encoder-decoder framework like U-Net \cite{ronneberger2015u}, utilize a combination of convolutions within the encoder and transposed convolutions in the decoder. For example, in our accelerated MRI Reconstruction and Cone-beam CT reconstruction experiments, we employ both vSHARP and $\partial$U-Net, which incorporate transposed convolutions. Building upon our concept of Modulated Convolution, we extend this approach to introduce \textit{Modulated Transposed Convolutions}. This adaptation involves modulating the transposed convolution kernels and biases through \eqref{eq:mod_conv}, using the auxiliary variable input. This method ensures that both encoding and decoding processes in our models are modulated.



\subsection{Deep Learning Reconstruction Backbones}
Our modulation mechanism is architecture-agnostic and can be incorporated into a wide range of learned iterative schemes.  In this work, we evaluate it within three representative backbones for each considered modality:

\paragraph{Iterative ADMM DL-based Accelerated MRI Reconstruction}
% 
% A wide range of deep learning approaches have been proposed for accelerated MRI, with many relying on unrolled iterative schemes that embed the acquisition physics within a learned optimization procedure. Examples include gradient-descent unrolling in either image or frequency domains \cite{Hammernik2017,Lnning2019,Sriram2020,Yiasemis2022b} and first-order methods based on proximal gradient \cite{Luo2023}, conjugate gradient \cite{Kim2022}, or ADMM \cite{10.1007/978-3-031-52448-6_45}.

In our experiments, we adopt an ADMM-based unrolled reconstruction framework in which each iteration alternates a data-consistency update with a learned denoising block. For static 2D reconstruction experiments, we use a 2D reconstruction model operating on individual slices, while for dynamic 2D reconstruction experiments we employ a 3D (2D+time) model that jointly processes spatial and temporal dimensions. All convolutional and transposed-convolutional layers within the learnable components are replaced by the proposed modulated convolutions, enabling the network to adjust its behaviour according to the acquisition parameters of each sample. The full set of update equations, initialization strategy, sensitivity-map refinement module, and network architectures follow the vSHARP formulation \cite{yiasemis2023vsharp}. Complete mathematical details and implementation specifics are provided in \Appendix{appendix2-vsharp}.

\paragraph{$\partial$U-net}
% 
For CBCT we adopt $\partial$U-net \cite{HauptmannCode}, a multi-scale learned iterative scheme operating purely in the image domain. Modulation is applied to every convolutional block across all resolution levels, allowing the model to track changes in acquisition conditions. A complete description of the hierarchical structure and the initialization procedure appears in \Appendix{appendix2-cbct}.


% As a baseline learned iterative scheme for fast and memory-efficient Cone-beam CT reconstruction, we used $\partial$U-net \cite{Hauptmann2020}, which is a multi-scale learned iterative scheme operating in image domain only at four different resolution scales of $1, \frac 1 2, \frac 1 4$ and $\frac 1 8$ starting at the lowest resolution. The network blocks operating at reduced resolutions are small convolutional neural networks consisting of $3$ convolutional layers with ReLU activations and normalization layers, while the network block operating at full resolution is a 3d U-net that combines the intermediate reconstructions to obtain the final image. FDK reconstruction with ramp filter and frequency cut-off at $95\%$ was provided as the initial reconstruction to the network.

\paragraph{Learned Primal-Dual Reconstruction}
% 
For 2D CT we use the Learned Primal--Dual (LPD) algorithm \cite{Adler2017b}, which alternates image-space (primal) and projection-space (dual) updates connected via differentiable projection and backprojection operators. All convolutional layers within both primal and dual updates are replaced by modulated convolutions. Architectural and algorithmic specifics are provided in \Appendix{appendix2-fanbeam}.

% As a baseline learned iterative scheme for CT reconstruction, we used Learned Primal-Dual algorithm (LPD) \cite{Adler2017b}. LPD is a learned iterative scheme inspired by the Primal-Dual Hybrid Gradient method\cite{Chambolle2011}, which, unlike $\partial$U-net, makes use of both image-space and projection-space operations in an end-to-end trainable network. Image-space computations are performed by \emph{primal blocks} and projection-space computations are performed by \emph{dual blocks}, all primal/blocks being small convolutional neural networks with $3$ convolutional layers, parametric ReLU (PReLU) activation functions and batch normalization layers. To connect primal and dual blocks, projection and backprojection operators are used. Unlike $\partial$U-net, LPD is not designed for optimal memory efficiency and cannot be applied to CBCT directly due to memory limitations.


