

\section{Results and Discussion}
\label{sec:results}

In this section, we present the results of the experiments. 
We validate our proposed self-supervised domain adaptation approach on two publicly available datasets (Section~\ref{subsec:dataset}) and compare it to current \ac{SOTA} methods for \ac{UDA} in Section~\ref{subsec:crossdomain_cls}. To assess the performance of our approach in a clinically-relevant use case, we further validate it on \acp{WSI} crops from our in-house cohort in Section~\ref{subsec:crossdomain_seg}. 
To help future research, the implementation is open source\footnote{Code available on \texttt{\href{https://github.com/christianabbet/SRA}{https://github.com/christianabbet/SRA}}.}. 
Further details on the experimental setup can be found in Appendix~\ref{app:experimental-setup}.

\subsection{Public Datasets}
\label{subsec:dataset}

% Quick intro dataset
In this study, we use two publicly available datasets, \ac{K19} \cite{kather2019predicting} and \ac{K16} \cite{kather2016multi}. The former is composed of $100,000$ image patches sampled from $9$ different \ac{CRC} tissue types (tumor, stroma, muscle, lymphocytes, debris, mucus, normal mucosa, adipose, and background) while the latter includes $5,000$ crops distributed over $8$ tissue types (tumor, stroma, complex stroma, lymphocytes, debris, normal mucosa, adipose, background). Following a discussion with expert pathologists, we group stroma/muscle and debris/mucus as stroma and debris respectively to create a corresponding adaptation between \ac{K19} and \ac{K16}. 
Complex stroma who is only present in \ac{K16} is kept for training but excluded from the evaluation process. 
With this problem definition, we fall into an open set scenario where the class distribution of the two domains does not rigorously match, as opposed to a closed set adaptation scheme. For more information about the datasets and their discrepancies please refer to Appendix~\ref{app:datasets}. 

\subsection{Cross-Domain Patch Classification}
\label{subsec:crossdomain_cls}

\input{table/tab_k19_k16}

\begin{figure*}[ht]
\centering
  \includegraphics[width=0.99\textwidth]{media/tsne.pdf}
  \caption{The t-SNE projection of the source (\acl{K19}) and target (\acl{K16}) domain embeddings. The top row shows the alignment between the source and target domain, while the bottom row highlights the representations of the different classes. We compare our approach (d) to other \ac{UDA} methods (a-c).}
  \label{fig:tsne}
\end{figure*}

In this task, we use the larger set \ac{K19} with $1\%$ of the source labels available and adapt it to \ac{K16} in order to simulate the clinical application where we usually rely on a large quantity of unlabeled data and only have access to few labeled samples. 
The results of our proposed \ac{SRA} method are presented in \tableref{tbl:cross-domain}, in comparison with the \ac{SOTA} algorithms for domain adaption. We first train our model in an unsupervised fashion ($\mathcal{L}_{\mathrm{SRA}}$) and then fit a linear classifier with few source labels on top of the frozen model weights.
As the lower bound, we consider direct transfer learning, where the model is trained in a supervised fashion on the source data only. 
We use the same logic for the upper bound by training on the target domain data (fully supervised approach). \figureref{fig:tsne} shows the t-SNE projection and alignment of the domain adaptation for the source only, top-performing baselines (OSDA, SSDA with jigsaw solving), and our method (\ac{SRA}). Appendix~\ref{app:queue}-\ref{app:tsne} provides the results of the ablation study as well as additional results. 

Stain normalization slightly decreases the performance as it introduces color artifacts that trick the network classifier. Our proposed \ac{SRA} method shows an excellent alignment between the same class clusters of the source and target distributions and outperforms other approaches in terms of weighted F1 score. 
Notably, our approach is even able to match the upper bound model for normal and tumor tissue identification.
The embedding of complex stroma, which only exists in the target domain, is represented as a single cluster with no matching candidates, which shows that the model was not forced to find suitable matches.
% Add more if space allows .. ?
Furthermore, the cluster representation is more compact compared to SSDA, where for example normal mucosa tends to be aligned with complex stroma and tumor.
SSDA and OSDA misclassify debris as lymphocytes due to their similar texture and structure. Self-Path suffers from DANN who's loss is unstable leading to large performances gaps when training. Heavier data augmentations partially solved this issue.
Our approach suffers a drop in performance for stroma detection, which can be explained by the presence of lymphocytes in numerous stroma tissue examples, causing a higher rate of misclassification. 


\subsection{Use Case: Cross-Domain Segmentation of WSIs}
\label{subsec:crossdomain_seg}


\begin{figure}[t]
\centering
  \includegraphics[width=0.99\textwidth]{media/bern_wsi.pdf}
  \caption{Examples of domain adaptations from \ac{K19} to our in-house dataset. (a-b) show the original sections from the \acp{WSI} and their ground truth, respectively. We compare the performance of our \acf{SRA} algorithm (f) to the lower bound and the previous top methods (c-e). We report the pixel-wise accuracy, the weighted \acl{IoU}, and the pixel-wise Cohen's kappa ($\kappa$) score.}
  \label{fig:bern_wsi}
\end{figure}

To further validate our approach in a real case scenario, we perform domain adaptation using our proposed model from \ac{K19} to our in-house slides and validate it on \acp{WSI} sections. 
We randomly extract patches from over 300 \acp{WSI} to train our model and then use a sliding window approach to predict the class of each patch in the selected regions.
The final prediction map is smoothed using conditional random fields as in \citet{chan2019histosegnet}. 
The results are presented in \figureref{fig:bern_wsi}, alongside the original \ac{HE} crop, their corresponding segmentation annotated by an expert pathologist according to the definitions used in the \ac{K19} dataset, as well as comparative results of the introduced approaches. 
The three sections were selected such that, overall, they represent all tissue types equally.

Our approach outperforms the domain adaptation methods in terms of pixel-wise accuracy, weighed \ac{IoU} and pixel-wise Cohen's kappa score.
For all models, stroma and muscle are poorly differentiated as both have similar visual features. 
This phenomenon is even more apparent in the source only setting where muscle tissue is almost systematically interpreted as stroma. 
SSDA tends to predict lymphocyte aggregates as debris, which can be explained by its sensitivity to staining variations.
OSDA on the other side fails to adapt and generalize to new debris examples while trying to reject mistrusted samples.
Regions with mixtures of tissue types (e.g., lymphocytes + stroma or stroma + isolated tumor cells) are challenging cases because the samples available in online cohorts mainly contain ideal examples with homogeneous tissue textures for each patch, and no mixed class examples. Subsequently, domain adaptations naturally struggle to align features resulting in a biased classification. 
We also observe that thinner or torn stroma regions, where the background behind is well visible, are often misclassified as adipose tissue by \ac{SRA}, which is most likely due to their similar appearance.
However, our \ac{SRA} model is able to correctly distinguish between normal mucosa and tumor, which are tissue regions with relevant information for survival analysis.  
