\section{Introduction}
\label{sec:1_intro}
A chest X-ray (CXR) is a fast, effective, and rather inexpensive aid for diagnosing and monitoring thoracic conditions \cite{c12}. Due to their prevalence in clinical practice, CXRs are a prominent image source for medical deep learning applications. Large datasets for radiological finding classification are common, often supported by automated labeling, whereas curating datasets for localization and segmentation requires costly, time-intensive manual annotation by medical experts. As a result, these datasets are not only scarce, but also tend to be small and limited in scope. 

To date, only eight publicly available CXR datasets provide bounding boxes for more than one radiological finding \cite{detection_NIH,ms-cxr-1,vindr1,detection_vindrpcxr,detection_reflacx,padchest,cxral14,chestxdet10}, and two provide segmentation masks \cite{chestxdet10,chexlocalize}. Of these eight localization datasets, only two contain over $10\,000$ scans\footnote{For comparison, PASCAL VOC (20 classes, $22\,591$ images) is considered one of the smallest widely used benchmarks for object detection in computer vision.}. Moreover, these datasets suffer from class imbalance. Even in the largest dataset, CXR-AL14 (with about $165\,000$ images), some common thoracic findings have low representation. For instance, atelectasis is found in only $0.2\%$ of its images.

\begin{figure}[t]
    \centering
    \includegraphics[width=.8\textwidth]{MIDLLatexTemplate-master/imgs/1_main.pdf}
    \caption{We propose \emph{SemiSynCXR}, a framework that generates semi-synthetic CXRs by inpainting radiological findings into healthy images at plausible anatomical locations. \emph{SemiSynCXR} automatically produces both the edited image and a precise bounding box, directly addressing data scarcity and class imbalance of existing localization datasets.}
    \label{fig:1_main}
\end{figure}

To overcome data scarcity in medical imaging, data synthesis offers a compelling solution \cite{c3.1,c3.0c4.0,ktena}. Driven by advances in high-quality image generation models like latent diffusion models 
\cite{ldm}, existing studies have demonstrated the potential of generating synthetic CXRs \cite{c71roent,cascade,RLCXR,chestdiff}. However, they largely focus on creating unlabeled images, image-text pairs, or images with only global classification labels, leaving the generation of much-needed localization datasets still unaddressed. 

Moreover, obtaining the finding's exact positional information is a significant challenge when generating localization datasets using fully-synthetic image generation. Since the generative model implicitly determines the finding placement, its precise location often remains unknown. We propose semi-synthetic image generation as an alternative, as illustrated in \figureref{fig:1_main}. By using automated image editing to inpaint findings into healthy CXRs, we can explicitly define the finding's location using conditioning masks. Thereby, the process inherently guarantees ground-truth bounding box annotations for every generated image, directly overcoming a core limitation of fully-synthetic approaches.

Our contributions are summarized as follows:
\begin{itemize}
\item We introduce \emph{SemiSynCXR}, a framework for automatically generating semi-synthetic CXRs with radiological findings. The framework's core strength is its ability to provide the generated image with intrinsically matching, precise bounding boxes at scale.

\item For this, we develop an automated mask generation method for \emph{SemiSynCXR} that places findings at anatomically plausible locations based on real-world spatial distributions. \emph{SemiSynCXR} further leverages existing diffusion models for inpainting findings into healthy CXR images, obviating the need for training new models.

\item Using \emph{SemiSynCXR}, we create a semi-synthetic dataset for CXR finding localization. Augmenting real training data with our generated samples significantly improves object detection performance and robustness, effectively mitigating data scarcity.

\item Extensive quantitative and qualitative evaluations confirm that the generated findings are realistically placed and that the resulting CXRs resemble real images.

\end{itemize}