\section{Related work}

Numerous deep learning approaches have been proposed to address domain shift in histo\-pathology \citep{Gangeh2025}.
These methods can be broadly categorized into color augmentation and color normalization strategies \citep{nguyencontrimix}. 
Color augmentation increases the diversity of training data by simulating variations in staining \citep{gao2022out}.
% , but it also imposes a more complex learning task on the downstream model. 
% This often necessitates larger model capacity, leading to increased memory usage and slower inference times.
% Furthermore, reliance on augmentations firmly couples the robustness to the development of the model for the downstream task.
% In many cases, data or code to retroactively improve the robustness of a model is not accessible, or it is impractical to make modifications and reapply for regulatory approval.
Color normalization, by contrast, aims to harmonize the appearance of histological images across domains. 
Recently, normalization has faced criticism for the limited effectiveness of widely used methods.
\citet{swiderska2020impact} show that simple color normalization fails to fully bridge the performance gap across domains.
Moreover, indicators of the data source often remain even after normalization, raising questions about its utility \citep{dawood2023tissue,howard2021impact}.
Such critiques have largely focused on non-deep learning methods.
Even if normalization does not completely eliminate domain differences, partial alignment can still yield meaningful gains in downstream tasks.
% Furthermore, critics argue that normalization introduces additional computational steps, potentially slowing down processing pipelines \citep{jahanifar2025domain}.
% However, we contend that lightweight normalization schemes are feasible and can be deployed with minimal cost.
% Conversely, models trained on heavily augmented data may require increased capacity to handle the added variability, which also introduces computational demands, albeit at the model level rather than preprocessing.
Contrary to augmentation, normalization offers a practical solution for scenarios where retraining models is infeasible, such as when data is unavailable, models are proprietary or third-party, or regulatory approval after retraining is impractical.
By harmonizing visual appearance, normalization enables the continued use of existing models.
% By harmonizing visual appearance, normalization enables the continued use of existing models across diverse datasets and institutions.
% Beyond its utility for machine learning, normalization also benefits pathologists by reducing variability in image presentation, which is especially important for telepathology and collaborative diagnostics.
% This consistency supports more reliable interpretation and decision-making, regardless of the source or technical setup of the images.
Importantly, augmentations and normalization are not mutually exclusive.
Their combined use may offer complementary benefits, augmentations enhance model robustness, while normalization improves interpretability and interoperability, especially in clinical workflows.

The traditional methods for stain normalization such as Reinhard’s technique \citep{reinhard2001color} and its variants, including RandStainNA \citep{shen2022randstainna}, apply linear transformations to match color distributions. 
However, these approaches may fall short in modeling non-linear variations introduced by different scanners or staining protocols.
StainGAN \citep{shaban2019staingan} and its derivative StainNet \citep{stainnet} employ GAN-based architectures for stain normalization. 
More advanced methods attempt to disentangle content from style attributes. 
\todo{Revisit ContriMix description and maybe add Doerrich et al.}
HistAuGAN \citep{wagner2021structure} and ContriMix \citep{nguyencontrimix}, for example, use adversarial objectives to extract attribute and content encodings. 
These can then be swapped and mixed for augmentations.
While seemingly visually effective, adversarial training suffers from high sample complexity and training instability \citep{wassersteingp, instablegan, Wiatrak2019StabilizingGA}, and can be prone to hallucinating features, as shown by \citet{vasiljevic2022cyclegan}.

Recently, diffusion models such as StainDiff \citep{shen2023staindiff} and StainFuser \citep{jewsbury2024stainfuser} have emerged as alternatives to GANs. These models add noise to input images and regenerate them in a target style, using cycle consistency losses to preserve structural integrity. However, the denoising process inherently risks overwriting subtle but important features, raising concerns about their reliability in clinical applications.

% In contrast, our approach avoids both adversarial and diffusion-based frameworks. 
% Inspired by the limitations observed in prior work, we propose a direct, non-linear normalization method that preserves image content while aligning color distributions. 
% This enables robust domain adaptation with a number of practical properties and no risk of hallucination.

\begin{figure*}
\centering
\includegraphics[width=\linewidth]{figures/architecture.png}
\caption{To prevent hallucinations, the network is constrained to only three 1\texttimes1 convolutional layers.
This limits the non-linear behavior of the functions that can be expressed.
The added skip connection and regularization on the weights keep the learned function smooth and close to the identity.
Within these constraints, the network is trained in an unsupervised fashion to match the color distribution of the target domain.} 
\label{fig:architecture}
\end{figure*}

\section{\rebutextra{Practical} properties}\label{sec:prop}

The similarity of the output images to the target domain and the performance of downstream applications after normalization are important measures for comparing stain normalization methods and are discussed in Section~\ref{sec:experiments}.
Other notable properties include the ease of training and the speed of inference.
Due to the simplicity of the loss function and network architecture, our method excels in these aspects, see the appendix for details.
Besides those, there are other desirable properties for normalization methods, as listed in Table~\ref{tab:prop}.

\subsubsection*{Retroactively applicable}
% As by  \citet{nguyencontrimix}, a distinction can be made between normalization of the stain color and augmentation of the stain color.
% Stain normalization methods aim to transform input images to make them better suited for analysis during downstream applications, such as classification or segmentation.
% Stain augmentation methods also transform input images.
% However, the transformed images are then intended to be used during training of downstream methods.
% By adding variation to the training dataset, the goal is to make normalization unnecessary.

Although data augmentation may improve inference speed compared to normalization, it is often impractical to generate a newly augmented version of the training dataset when novel methods or data become available.
This process would require access to the original training data and the training algorithm used for the downstream task.
Moreover, it could invalidate existing regulatory approvals for the downstream method, necessitating a repeat of the approval process, which may include conducting new trials.
However, stain normalization does not alter the downstream method.
As a result, it is easier to train a normalization method in an unsupervised manner and have this simpler step cleared to improve the performance of downstream analysis.
Stain normalization is retroactively applicable to methods that have already been developed and approved, possibly by a third party.

\subsubsection*{Retention of infrequent colors}\label{sec:infrequent}
A form of information loss occurs when multiple colors are mapped to the same output.
We see this in StainNet, for example when it is tasked with mapping colors that are not well represented in its training dataset, Figure~\ref{fig:cubes} in the appendix.
This happens to the yellow-brownish specks in Figure~\ref{fig:hallucination}.
% A possible explanation for this is that StainNet is directly supervised by StainGAN, a generative adversarial network tasked with generating images, and therefor colors, from the target distribution. 
% In contrast, our networks contain residual skip connections and are trained with weight regularization.
Our networks contain residual skip connections and are trained with weight regularization.
This keeps the learned mapping close to the identity function.

\subsubsection*{Resilient to hallucination}
Whenever a typical neural network is tasked with generating content, there is a risk of hallucination.
In the case of stain normalization, we define hallucinations as the introduction or removal of structures in the image.
% We illustrate this in Figure~\ref{fig:hallucination}.
Our method avoids such hallucinations as the entire processing windows are limited to one pixel at a time.
This makes the generation of new structures impossible.
An analogous argument can be made for StainNet, which has a very similar architecture with only 1$\times$1 convolutions but without the skip connection.
As illustrated in Figure~\ref{fig:retrained}, in practice, we do observe the removal of clinically relevant structures in StainNet outputs.
Our hypothesis is that this is due to the direct supervision by StainGAN, which does not have the same limitation and can hallucinate structures.
When a supervising StainGAN model outputs several colors for the same input color, depending on the context, the distilled StainNet model is forced to average these colors.
It seems that StainNet models are particularly susceptible to this since they are prone to discarding color information, as discussed in the previous paragraph.
% In theory, it is still possible for our models to remove structures when a set of colors is entirely mapped to similar colors.
% We assume such a failure mode is rare, further mitigated by regularization, and would be noticed.
% A similar argument can be made for ContriMix, as its image generation involves the product between the content and attribute encodings per pixel. 
% However, the receptive field of the content encoder is larger than one pixel so in theory it is possible that new structures are introduced, though we did not observe this.

\subsubsection*{Scale independent}
Most stain normalization methods take crops at a specified zoom level as input.
% Even if the image size of a whole slide image does not match, simply cropping and stitching the results back together can produce inputs at the wrong scale and therefor unexpected results.
Even simply cropping and stitching the results back together can produce inputs at the wrong scale and therefore unexpected results.
Our method functions only on colors and is unaffected by the scale at which those colors are presented.
Even for training, heterogeneous scales may be present within and between the source and target dataset.
This makes it easier for practitioners to construct a dataset.

\subsubsection*{Structure independent}
When the receptive field of a method is limited to one pixel at a time, it is impossible to rely on anatomical structures for the generation process.
All such methods do is  model the difference between the digitization of true colors due to factors such as slide preparation, scanner, and settings.
As such, it does not matter what anatomical structure or applied protein exhibits that color.
This allows a multimodal approach to constructing datasets for training, as slides from different sources can be aggregated.
% Such training is not possible for StainNet as it relies on StainGAN.
