\title{Appendix}
\section{Datasets} 
\label{appendix:datasets}

\subsection{Fetal Head Segmentation  Datasets}
During evaluation of our proposed method, we include $2$ different datasets for fetal head segmentation.
\begin{itemize}
    \item The RADBOUD \cite{vandenHeuvel2018}-FetalHC dataset consists of $999$ 2D ultrasound images of fetal head circumference (HC) measurements, collected from $551$ pregnant women during routine screenings at RADBOUD University Medical Center between $2014 - 2015$. The images, sized $800 \times 540$ pixels with a pixel resolution between $0.052 - 0.326$ mm, exclusively include fetuses without growth abnormalities.
\item The JNU-IFM \cite{Lu.2022} The dataset comprises $6224$ fetal head ultrasound images collected from $78$ videos recorded from $51$ pregnant women between $2019$ and $2020$. It originates from the Intelligent Fetal Monitoring Lab of Jinan University. 
It is categorized into four labels: \textit{None} (1,022 images), 
\textit{OnlySP} (323 images), \textit{OnlyHead} (1,136 images), and \textit{SPHead} (3,743 images). We evaluate only on samples with \textit{OnlyHead} label
\end{itemize}

\subsection{Thyroid Gland  Datasets}
For thyroid analysis, two datasets were used. 
\begin{itemize}
    \item 
The Magdeburg-Thyroid3D \cite{Wunderling2017} thyroid dataset includes sixteen 3D ultrasound scans of healthy thyroid lobes from the University Hospital Magdeburg\footnote{\url{https://www.med.uni-magdeburg.de/en/}}, each with expert-annotated segmentations. To utilize the dataset, we split 3D into 2D slices and we kept only the slices containing thyroid annotations. Thus, we obtain 2D image–mask pairs which are used for our experiments.

\item The TG3K \cite{gong2022thyroid} dataset comprises of $3585$ 2D B-mode thyroid gland ultrasound images acquired via GE Logiq E9 scanners, with pixel-wise annotations for gland segmentation .
\end{itemize}

\subsection{CAMUS (Cardiac Acquisitions for Multi-structure Ultrasound Segmentation) Dataset}
The CAMUS dataset consists of 2D echocardiographic image sequences of apical four-chamber and two-chamber views. They are acquired from 500 patients at the University Hospital of Saint-Étienne (France) \cite{Leclerc.2019}. 
\begin{itemize}
    \item Images include expert annotations of the left ventricular endocardium, epicardium, and left atrium at end-diastole and end-systole. Thus, they encompass a wide range of cardiac pathologies and imaging conditions.
    \item The dataset is publicly available for benchmarking cardiac segmentation and volume estimation tasks in echocardiography.
\end{itemize}

\subsection{BUSI (Breast Ultrasound Images Dataset) Dataset}
The BUSI dataset \cite{AlDhabyani.2020} includes a collection of B-mode breast ultrasound images used for lesion segmentation and classification.
\begin{itemize}
    \item The dataset comprises of $780$ ultrasound images from approximately 600 female patients aged between $25 - 75$ years. The images are captured using LOGIQ E9 and LOGIQ E9 Agile ultrasound systems at Baheya Hospital in Egypt.
    \item Each image, which has a resolution of $ 500 \times 500$ pixels, is categorized into three patient groups: normal ($133$), benign ($437$), and malignant ($210$).
    \item For benign and malignant cases, pixel-level segmentation masks of the lesion areas are provided to facilitate tumor delineation and carry out further analysis.
\end{itemize}

\subsection{BUS-BRA (Breast Ultrasound Dataset)}
The BUS-BRA dataset \cite{WilfridoGomezFlores.2023} provides a large-scale breast ultrasound image collection designed to support computer-aided diagnosis and segmentation evaluation.
\begin{itemize}
    \item It contains $1875$ anonymized ultrasound images from $1064$ patients undergoing routine breast ultrasound examinations in Brazil. Moreover, they acquired using four different scanners.
    \item Each image includes biopsy-proven lesion annotations and two labels (benign and malignant), along with expert-annotated segmentation masks of the lesion and breast tissue regions.
\end{itemize}
\newpage
\section{Implementation Details For ICL-NoiseUNet} 
\label{appendix:implementation}
\vspace{-1em}
\input{tables/implementation_details}
\newpage
\section{Sensitivity Analysis of the Noise Modulation Block}
\label{appendix:nmb_ablation}
\subsection{Qualitative Results}
%To evaluate the contribution of each noise map that exists in the Noise Modulation Block (NMB), we perform an ablation study on the CAMUS \cite{Leclerc.2019} and BUS–BRA \cite{WilfridoGomezFlores.2023} datasets. 
%We compare four variants: (1) Full NMB (residual + variance maps), (2) Residual noise–only computation, (3) Variance–only computation, and (4) No NMB. Statistical significance is calculated using the Wilcoxon pairwise signed–rank test for dice scores. Across both datasets, the full design that combines the residual and variance maps yields the strongest results. Also, table~\ref {tab:nmb_ablation} proves the necessity of using these components together. The “Residual Only” and “Variance Only” variants perform similarly to each other but are consistently worse than the full design. Also, removing the Noise Modulation Module leads to a performance decline. Thus, results highlight the importance of the module. All differences relative to the full model are statistically significant, so they validate our hypothesis.

%\input{tables/ablation_nmb}

\input{figures/ablation_busbra}
\FloatBarrier
\input{figures/ablation_camus}
%\paragraph{Qualitative Results Discussion} The residual-only variant preserves sharp boundaries but often misses low-contrast or blurred regions, resulting in increased false negatives. In contrast, the variance-only variant suppresses noise but frequently overextends into surrounding tissue, leading to a higher number of false positives. By combining both descriptors, the full block reduces both false negatives and false positives, resulting in cleaner and more anatomically consistent segmentation.

\subsection{Window Size Selection For Residual Noise and Variance Maps}
Table ~\ref{tab:window_ablation} shows that a window size of $7$ gives the most reliable noise maps for the model. Smaller windows ($3$ and $5$) extract little local context, leading to weaker noise estimation and lower accuracy. A larger window (9) fails to capture important boundaries. Thus, window size $7$ provides the best balance between detail and context, which is reflected in the highest Dice and IoU scores across both datasets. Therefore, we select 7 as the default window size in our model.
\input{tables/ablation_window_size}

\section{Additional Results Regarding The Effect of Context Size on Segmentation Performance}
\label{appendix:context_ablation_table}
\input{tables/context_size_ablation}
Table \ref{tab:context_ablation} shows that segmentation performance  L=4 achieves the highest Dice and IoU scores with context size $L=4$. The statistical analysis confirms that all other configurations differ significantly from  L=4, highlighting it as the optimal choice and confirming the analysis of \ref{sec:context_ablation}.

\section{Evaluation of Shared Encoder-Decoder Weights for ICL-NoiseUNet}
\label{appendix:shared_weights}
We further evaluate our approach when we use shared encoder–decoder weights for the target and context backbones. In the CAMUS \cite{Leclerc.2019} dataset, ICL-NoiseUNet reaches a Dice of $0.933$. We also implement an ICL-NoiseWNet version, in which a W-Net \cite{xia2017wnetdeepmodelfully} with shared parameters is utilized for the target and context branch.  ICL-NoiseWNet achives a Dice score of $0.940$ and a Recall score of $0.966$. In thyroid gland segmentation, full design variant consistently achieves 2\% or higher Dice scores compared to variants without the Noise Modulation Block (NMB). Thus, it is highlighted that the integration of the NMB to each encoder-decoder block provides a consistent performance advantage across different medical segmentation tasks.
\input{tables/Camus_Unet_Wnet}

\section{Inference Time and Parameter Size Analysis}
\label{appendix:inference}
\input{tables/inference_size}


%\section{Additional Comparison Results}
%\input{tables/busbra_SAM_SOTA}
\FloatBarrier
\section{Effect of Different Context Sizes at Inference}
\label{appendix:context_inference}
To further evaluate the robustness of ICL-NoiseUNet to variations in context size, we trained the model using a fixed context size of $L=4$ and subsequently tested it with different context sizes across datasets. We present the Dice and IoU distributions for the BUS-BRA \cite{WilfridoGomezFlores.2023} and CAMUS \cite{Leclerc.2019} datasets, respectively. The results demonstrate that segmentation performance remains stable across a wide range of context sizes. In other words, the model effectively captures contextual information during training, enabling reliable segmentation even when the available contextual information changes at inference time.
\input{figures/context_inference}
\newpage
\section{Representation Level Analysis}
\textcolor{red}{%To provide the deeper analysis requested, we conducted t-SNE visualization of bottleneck features from 100 samples (67 benign, 33 malignant) on the BUS-BRA \cite{WilfridoGomezFlores.2023} dataset.  As can be seen in Figure ~\ref{fig:feature_level}, without NMB, benign and malignant features are heavily intermixed, indicating that the latent space lacks clear class structure. In contrast, when NMB is incorporated, features form more compact and class-aligned clusters with visibly reduced overlap (Fig. X, left), which is quantitatively reflected by a higher silhouette score (0.26 vs. 0.01). While t-SNE is primarily a visualization tool, the consistent improvement in both visual separability and clustering score suggests that NMB leads to more structured and class-discriminative representations. This representation-level behavior is consistent with the observed improvements in segmentation performance. 
\label{appendix:representation}
 \input{figures/tsne}
 }
 \section{Robustness to Context Selection Method}
\input{tables/context_method}
\label{appendix:robustness}
\label{appendix:context_sel}
\section{Comparison with Few-Shot Learning Approaches and Feature-Wise Conditioning}
\label{appendix:few_shot}
\input{tables/few_shot_comparisons}
%\section{Qualitative Comparison of Model Variants}
%\label{appendix:qualitative_nmb}
%\input{figures/ablation_busbra}
%\FloatBarrier
%\input{figures/ablation_camus}
%\paragraph{Short Discussion} The residual-only variant preserves sharp boundaries but often misses low-contrast or blurred regions, resulting in increased false negatives. In contrast, the variance-only variant suppresses noise but frequently overextends into surrounding tissue, leading to a higher number of false positives. By combining both descriptors, the full block reduces both false negatives and false positives, resulting in cleaner and more anatomically consistent segmentation.
\section{Learned Modulation Parameters}
\label{appendix:modulation_parameters}
The learned residual weights $\alpha_k$ and variance weights $\beta_k$ exhibit clear trends across datasets, as summarized in Table~\ref{tab:noise_weights}. For the CAMUS \cite{Leclerc.2019} and Thyroid-Madgeburg \cite{Wunderling2017} datasets, both weights start at relatively low values in the early network blocks (approximately $0.44$--$0.48$) and progressively increase with network depth, reaching $0.55$--$0.56$ in the deepest blocks. This indicates that the model learns to apply stronger noise modulation in deeper layers, where features become more semantic. 
In contrast, the BUS-BRA \cite{WilfridoGomezFlores.2023} dataset shows a different behavior, with $\alpha_k$ and $\beta_k$ remaining relatively stable across layers and confined to a narrow range of $0.48$--$0.52$. Consequently, breast ultrasound segmentation benefits from applying similar noise handling at all network blocks. Across all datasets, the residual and variance weights remain approximately equal, so the analytic descriptors contribute equally at every network level. Overall, these observations show that the proposed model adapts its noise modulation strategy to each task, rather than applying a single approach across all datasets.
\input{tables/learned_modulation_parameters}