\section{Introduction}
\label{sec:intro}

Modern medical imaging relies on multiple modalities such as Magnetic Resonance Imaging (MRI) and Computed Tomography (CT). Unlike chest X-ray or digital mammography, MRI and CT acquire measurements that must be transformed into anatomical images through a reconstruction procedure that explicitly incorporates the physics of the acquisition process.


% Modern medical imaging essentially makes use of multiple imaging modalities such as  Magnetic Resonance Imaging (MRI) and Computed Tomography (CT). Unlike chest X-ray or Digital Mammography, the measurements acquired by MRI and CT scanners need to undergo a complicated process of reconstruction, where a dedicated reconstruction algorithm transforms the measurements into a usable representation of the anatomy leveraging the knowledge of the underlying physics of the acquisition process. 

In accelerated MRI, the scanner collects an undersampled version of the signal's Fourier transform, providing only a masked subset of $k$-space. Reconstruction therefore requires estimating the missing data using prior knowledge about the underlying image, as in compressed sensing approaches \cite{Lustig2007}. Acquisition parameters such as acceleration factor, low-frequency sampling density, or field strength vary across protocols and hardware, and these variations directly affect image quality and noise characteristics.


% In case of accelerated MRI, the measurements are given in the form of undersampled k-space data as a `masked view' of the Fourier transform of the signal that we want to retrieve. The reconstruction algorithm in this case needs to find a way to fill in the missing parts of the k-space using the prior knowledge about the signal. Approaches such as compressed sensing \cite{Lustig2007} aim in reconstructing images faithfully from such data. Importantly, certain acquisition parameters such as acceleration factor, low-frequency signal density or field strength, can vary across protocols and hardware, influencing image quality and noise characteristics.

% In tomography, the measurements consist of one- or two-dimensional X-ray projections acquired as the tube and detector rotate around the patient. Fan-beam CT produces one-dimensional projection images from a fan-shaped beam, while Cone-beam CT (CBCT) produces two-dimensional projections from a cone-shaped beam. The recorded intensities reflect the attenuation coefficients along each ray, and reconstruction seeks to recover these coefficients using classical methods such as filtered back-projection \cite{Radon1986,Feldkamp84,Markoe2006} or regularized iterative reconstruction \cite{Kaipio2005}. As in MRI, acquisition settings such as tube voltage, tube current, and projection count differ across scanners and protocols; for example, lower tube current reduces radiation exposure but increases noise, which in turn affects the ideal regularization strength.



In tomography, the measurements constitute a set of one- or two-dimensional projection images recorded by the X-ray detector as it rotates around the patient, which is the source of photons,  positioned on the opposite side. Fan-beam CT refers to the two-dimensional variant where X-rays diverge from the source in a fan shape forming a one-dimensional projection image, while Cone-beam CT (CBCT) describes the three-dimensional variant with X-rays diverging in a broad cone, forming a two-dimensional projection image. The recorded intensities reflect the attenuation coefficients along each ray, and reconstruction seeks to recover these coefficients
% The intensities recorded in the projection images are largely determined by the X-ray attenuation coefficients of the tissues between the source and the detector, and the goal of the reconstruction procedure is to retrieve these attenuation coefficients. This can be accomplished 
using classical reconstruction methods such as filtered back-projection (FBP) algorithm \cite{Radon1986, Feldkamp84, Markoe2006} and iterative reconstruction with some form of regularization \cite{Kaipio2005}. As in MRI, acquisition settings such as tube current and projection count differ across scanners and protocols; e.g., lower tube current reduces photon count and the associated radiation exposure but increases the noise, which in turn affects the ideal regularization strength.

% Similarly to accelerated MRI, acquisition settings such as X-ray tube voltage and current or the total projection count can vary across different protocols and hardware; in particular, lower tube current would result in lower radiation exposure but increase the amount of noise in the image. The amount of noise in the image can influence the optimal hyperparameters of the reconstruction algorithm, e.g., higher amount of regularization would be used for noisier acquisitions. 


Although classical reconstruction methods remain routinely used, deep learning-based reconstruction approaches have gained traction surpassing classical methods in recent reconstruction challenges \cite{10.3389/fnins.2022.919186, Muckley2021}. A broad spectrum of these approaches adopt learned iterative schemes, inspired by classical iterative methods, while utilizing the measurement operator and/or its adjoint into the architecture. Examples include Learned Primal-Dual \cite{Adler2017b}, $\partial$U-net \cite{Hauptmann2020}, Recurrent Inference Machines \cite{Lnning2019} and Variational Networks \cite{Hammernik2017,Yiasemis2022b}.


However, the challenge remains that the variability in hardware and acquisition protocols often isn't directly accounted for within the reconstruction network architecture. This omission forces the network to infer acquisition parameters implicitly, leading to potential inaccuracies. Also, training separate models for all possible settings is generally not feasible. This work aims to address these issues by introducing a framework for \textit{conditional learned iterative schemes}, where the model parameters are adapted in a \textit{learned} way to the physical acquisition settings of each individual sample. 

Several prior works have explored conditioning neural networks on auxiliary information in medical imaging. A common strategy is feature-wise modulation of activations, where learned affine transformations condition intermediate feature maps on side information, as in FiLM \cite{perez2018film} and Adaptive Instance Normalization \cite{huang2017arbitrary}, with applications to medical image segmentation and representation learning \cite{lemay2021benefits,liu2022learning}. Another line of work relies on hypernetwork-based conditioning, where a separate network generates the weights of the reconstruction model as a function of acquisition parameters, yielding context-specific models \cite{ramanarayanan2023mci}. More recently, adaptive convolution has been proposed for QSM dipole inversion, where convolution kernels are generated from acquisition geometry parameters within a feed-forward U-Net architecture \cite{graf2024incorporating}. In contrast, we propose a lightweight conditioning of the reconstruction operator via modulated convolutions, where learnable weights are partially modulated given the acquisition parameters. This design is architecture-agnostic and naturally extends beyond reconstruction, enabling principled use of auxiliary acquisition information across a broad range of medical imaging tasks where such metadata is available. Our contributions can be summarized as follows:
\begin{itemize}[label=$\bullet$,leftmargin=*,topsep=0pt]
  \setlength{\itemsep}{0.2em}     % space between items
  \setlength{\parskip}{0pt}
  \setlength{\parsep}{0pt}
    \item We introduce the concept of modulated convolutions within iterative image reconstruction schemes to facilitate conditional learning.
    \item We evaluate conditional versus non-conditional learned iterative schemes for accelerated MRI and Cone-beam/Fan-beam CT reconstruction, displaying  consistent improvements.
\end{itemize}



 \input{figs/modocnv}