

\section{Results and Analysis}


We evaluate our framework along three criteria: 
(i) radiologist evaluation, 
(ii) quantitative assessment on standard image metrics, 
(iii) downstream impact on nodule detection models




\subsection{ Radiologist Evaluation } 
\noindent \textbf{Task 1 (Realism)} We presented a balanced set of 50 nodules (25 real and 25 generated by our base diffusion model) to three expert radiologists. On average, \textbf{90\%} of real nodules were correctly identified as real, while \textbf{80\%} of synthetic nodules were also labeled as real, indicating that our model produces nodules that are highly realistic and often indistinguishable from genuine cases.


\noindent \textbf{Task 2 (Controllability)} To evaluate characteristic-specific LoRA adapters, we generated 10 nodules per target feature (e.g., border type, texture) and asked 3 radiologists to verify the intended trait. Table~\ref{tab:characteristic-accuracy} summarizes the majority-agreement rates across features. %Agreement was consistently high, reaching 100\% for irregular borders and inhomogeneous texture, confirming that LoRA adapters enable fine-grained and clinically interpretable control over generated nodules.




\noindent \textbf{Task 3 (Subtlety)} We evaluated Subtlety LoRA by generating 20 synthetic nodule patches from the same mask, each rendered at 3 different levels, as shown in Figure \ref{fig:scale_comparison}. Radiologists were asked to arrange the samples in order, from the most obvious to the most subtle nodules. Across cases, majority consensus ordering aligned with our subtlety scale in 80\% of cases, confirming a clear, clinically relevant progression in subtlety of generated nodules.

\begin{table}[h]
  \centering
  \scriptsize               % smaller font
  \setlength{\tabcolsep}{4pt} % reduce column spacing
  \renewcommand{\arraystretch}{0.85} % tighter rows
  \caption{Radiologist evaluation of characteristic-specific LoRA modules.}
  \label{tab:characteristic-accuracy}
  \begin{tabular}{@{} l r @{}} 
    \toprule
    \textbf{Nodule characteristic} & \textbf{Agreement (\%)} \\
    \midrule
    Calcification          & 80 \\
    Regular border         & 90 \\
    Irregular border       & 100 \\
    Homogeneous texture    & 90 \\
    Inhomogeneous texture  & 100 \\
    \bottomrule
  \end{tabular}
\end{table}


\input{sec/merge-comparison}




\subsection{ Downstream Evaluation } 
\textbf{Diffusion Baseline Evaluation: } We evaluated the detection performance of models augmented with synthesized nodules from our baseline diffusion model. A Swin-Tiny \cite{liu2021swintransformerhierarchicalvision} encoder combined with a U-Net++ \cite{zhou2018unetnestedunetarchitecture} architecture was trained jointly for nodule classification and segmentation. The training set consisted of approximately 10k real nodules with nearly 100k normal CXRs, supplemented with synthesized nodules. 

\noindent As shown in Table~\ref{tab:quantitative-eval-wide}, augmenting the training data with our synthetic nodules consistently improved both classification and segmentation performance across all test sets. These results highlight two important trends: (1) augmenting with synthetic nodules consistently boosts downstream detection performance across datasets, and (2) performance improvements plateau or slightly decline beyond an optimal level of augmentation, suggesting that carefully balanced integration of synthetic data maximizes its effectiveness.

\begin{table*}[h]  % table* makes it span both columns if you're in 2-column format
  \centering
  \scriptsize
  \setlength{\tabcolsep}{4pt}
  \renewcommand{\arraystretch}{0.95}
  \caption{Quantitative evaluation of the effectiveness of different quantities of synthesized nodule data. Reported metrics include AUC and best IoU on three test sets.}
  \label{tab:quantitative-eval-wide}

  \begin{tabular}{@{} l cc cc cc @{}}
    \toprule
    \textbf{Train Data} 
      & \multicolumn{2}{c}{\textbf{In-house}} 
      & \multicolumn{2}{c}{\textbf{JSRT}} 
      & \multicolumn{2}{c}{\textbf{CheX-ray14}} \\
    
    & \textbf{AUC} & \textbf{IoU}
    & \textbf{AUC} & \textbf{IoU}
    & \textbf{AUC} & \textbf{IoU} \\
    \midrule

    10k real
      & 0.9705 & 0.3090
      & 0.8560 & 0.2475
      & 0.9008 & 0.5285 \\

    10k real + 2k syn
      & 0.9788 & 0.3222
      & 0.8639 & 0.2743
      & 0.9168 & 0.5293 \\

    10k real + 4k syn
      & 0.9780 & 0.3197
      & 0.8780 & 0.2589
      & 0.9245 & 0.5500 \\

    10k real + 6k syn
      & \textbf{0.9802} & \textbf{0.3247}
      & 0.8940 & 0.2894
      & 0.9315 & 0.5750 \\

    10k real + 8k syn
      & 0.9796 & 0.3274
      & 0.8864 & 0.2923
      & \textbf{0.9341} & \textbf{0.5954} \\

    10k real + 10k syn
      & 0.9801 & 0.3056
      & \textbf{0.9023} & \textbf{0.3091}
      & 0.9318 & 0.5613 \\

    \bottomrule
  \end{tabular}
\end{table*}

\noindent \textbf{Characteristic-Specific LoRA Adapters Evaluation: } We trained a Swin-Tiny~\cite{liu2021swintransformerhierarchicalvision} encoder with a multi-head classification module for all radiological characteristics, augmenting the training set with approximately 400 synthetic nodules per characteristic. The results in Table~\ref{tab:comp_5k} show that with augmentation the IoU score has improved across all characteristics on our in-house testset. We also evaluate the subtlety on JSRT subtlety dataset, results are provided in Appendix~\ref{sec:subtlety-eval}



% Requires \usepackage{booktabs,multirow}
\begin{table}[t]
\centering
\caption{Comparison of models against IoU scores trained with 5k real nodules versus 5k real nodules with 2k characteristic specific synthetic nodules across radiological features. 
}
\label{tab:comp_5k}
\setlength{\tabcolsep}{6pt}
\renewcommand{\arraystretch}{0.95}
\scriptsize
\begin{tabular}{@{}llccc@{}}
\toprule
 \textbf{Characteristic} & \textbf{5k Real} & \textbf{5k Real + 2k synthetic }\\
\midrule
\multirow{6}{*}{}
Nodule            & 0.2696 & \textbf{0.3002}\\
Calcification      & 0.2879 & \textbf{0.3199} \\
Regular Border    & 0.2941 & \textbf{0.3301} \\
Irregular Border  & 0.2733 & \textbf{0.3080} \\
Homogeneous        & 0.2695 & \textbf{0.3050} \\
Inhomogeneous      & 0.2706 & \textbf{0.2963} \\
\bottomrule
\end{tabular}
\end{table}


\subsection{Comparison with Existing Methods} 
To ensure a fair comparison, all baselines were trained on the same in-house dataset. We benchmark three families of generative approaches: GAN-based models, fill-based inpainting, and our Stage-2 diffusion framework (Figure~\ref{fig:your_label}). For inpainting, we include CR-Fill~\cite{zhao2021crfill}, the top performer in the NODE21 Generation Track~\cite{Sogancioglu2024NODE21}, given its strong CXR inpainting performance. For GANs, we evaluate ACGAN~\cite{odena2017acgan} and ReACGAN~\cite{lee2021reacgan}, widely used class-conditional frameworks. To assess the impact of synthetic nodules, we augmented the training data with 10k generated samples from each method and measured classification AUC on JSRT and ChestX-ray14 (Table~\ref{tab:combined-metrics2}). Although all augmentations improved over using 10k real samples alone, diffusion-based augmentation achieved the highest gains of 0.9023 AUC on JSRT and 0.9318 on ChestX-ray14, demonstrating its effectiveness for downstream detection.

\begin{table}[h]
\centering
\caption{Comparison of effect of synthetic-data augmentation on nodule AUC scores across \textit{ChestX-ray14} and \textit{JSRT}. }
\label{tab:10k_configs}
\setlength{\tabcolsep}{8pt}      % default: 6pt
\renewcommand{\arraystretch}{1} % default: 1.0
\scriptsize
\begin{tabular}{lcc}
\toprule
\textbf{Augmentation} & \textbf{JSRT} & \textbf{ChestX-ray14} \\
\midrule
10k real                       & 0.8560 & 0.9008 \\
10k real + 10k ACGAN         & 0.8780 & 0.9281 \\
10k real + 10k ReACGAN       & 0.8808 & 0.9259 \\
10k real + 10k CR-Fill       & 0.8786 & 0.9296 \\
10k real + 10k DiT-XL/2(Ours)& \textbf{0.9023} & \textbf{0.9318} \\
\bottomrule
\end{tabular}
\label{tab:combined-metrics2}
\end{table}



\section{Conclusion}
We introduced a novel diffusion-based framework for pulmonary nodule synthesis with characteristic-specific LoRA adapters, and an orthogonality constrained LoRA merging strategy. Experiments show that our method generates realistic and controllable nodules, outperforms GAN and inpainting-based baselines, and improves downstream CAD performance, with radiologist evaluations confirming clinical plausibility. Limitations include difficulty with some out-of-distribution generations by composition of LoRAs. Future work includes extending the merging strategy to more than two characteristics.
%Future work will take this forward. 
