\section{Tissue-adaptive segmentation with superpixels}
\label{app:segmentation}
To show that superpixels created with SLIC~\cite{radhakrishna2012SLIC} respect tissue boundaries, we show an overlay of such a segmentation on the tissue in \figureref{fig:slic}.
The individual regions are irregularly formed, and their borders closely align with natural tissue structures, making it well-suited for our use case of serving as initial node creation.
\begin{figure}[h]
\centering
\includegraphics[width=0.45\linewidth]{figures/slic_half.png}
\caption{Illustration of fine-grained, tissue-adaptive segmentation. Superpixel boundaries, highlighted in yellow, are overlaid on a tissue micrograph to demonstrate their precise alignment with the underlying morphological structures.}
\label{fig:slic}
\end{figure}


\section{Additional qualitative examples}
\label{app:examples}
We show additional qualitative examples in \figureref{fig:example2} and \figureref{fig:example3} with highlighted regions in global slide context, a zoomed-in view and view of important features in such regions. We observe, that just a few critical regions are highlighted as they are the main drivers for the prediction. Additionally, we note that out of the 191 features only a handful are major factors for the prediction, serving as a brief explanation accompanying the prediction.

\begin{figure*}[ht]
\includegraphics[width=0.6\linewidth]{figures/exp1.png}
\centering
\caption{Top left: Original WSI; top right: important regions are highlighted; bottom left: zoomed in version of highlighted region; bottom right: feature attribution scores.}
\label{fig:example2}
\end{figure*}


\begin{figure*}[ht]
\includegraphics[width=0.6\linewidth]{figures/exp2.png}
\centering
\caption{Top left: Original WSI; top right: important regions are highlighted; bottom left: zoomed in version of highlighted region; bottom right: feature attribution scores.}
\label{fig:example3}
\end{figure*}

\section{Experimental setup}
\label{app:experiment}

To ensure a rigorous and unbiased evaluation, we first partition the entire dataset into a fixed training set and a hold-out test set. All splits are performed at the location- (and therefore also patient-) level to prevent data leakage.
Model optimization and selection are conducted solely on the training set. For each method, we perform a 25-trial random search for the learning rate and weight decay. Each trial consists of training five model instances on a sub-partition of the training set and evaluating them on a validation set. The models from the trial yielding the best average validation performance are then selected and their performance on the hold-out test set is reported (mean and standard deviation). To asses statistical significance between our method and the baselines, we apply an independent Student's t-test to the sets of five test scores.

To evaluate models on survival prediction we use the concordance index (c-index)~\cite{harrell1982cindex}.
The first task, cancer stage prediction, involves predicting the pathological stage of the tumor directly from the WSI. Cancer staging is a critical component of clinical oncology, as it describes the extent of cancer progression and acts as a basis to determine treatment plans.

The second task, survival prediction, stratifies patients into risk groups based on their predicted prognosis.
This task is framed as a multi-class classification task, where patients' risks are discretized into four groups, and this assignment acts as the prediction target, following~\cite{wulczyn2020survival}. As metric we follow survival prediction literature and calculate the concordance index:

The concordance index (c-index) measures the ratio of correctly ordered (concordant) pairs to the total number of informative pairs. To address significant variations in follow-up duration and median survival across different cancer types, we calculated the c-index for the combined cohort using a stratified aggregation. This involved summing the concordant and informative pairs within each individual study before calculating the final ratio, ensuring that patient rankings were only evaluated relative to others in the same specific study. \cite{wulczyn2020survival}


\section{Nuclear explanations}
\label{app:hover}
Hover-Net classifies nuclei into one of six classes that can be used to provide detailed explanations~\cite{gamper2019pannuke}:

\begin{itemize}
    \item Neoplastic: These are the tumor cells themselves (malignant or benign), characterized by abnormal growth.
    \item Non-Neoplastic Epithelial: These are normal, hyperplastic, or dysplastic epithelial cells that are not part of the tumor mass.
    \item Inflammatory: Immune system cells, such as lymphocytes and macrophages, responding to the tumor microenvironment.
    \item Connective / Soft Tissue: Cells forming the stroma and supporting tissue, like fibroblasts and endothelial cells.
    \item Dead: Nuclei of cells in apoptotic or necrotic states, often fragmented or degraded, which can be an important biological indicator.
\end{itemize}

\section{Efficiency calculations}
\label{app:calculation}
Here are extended calculations to compare the efficiency of our model to UNI2-h with respect to data requirements and model size:

\begin{table}[!ht]
\centering
\begin{tabular}{lll}
\toprule
\textbf{Model} & \textbf{Data usage (WSIs)} & \textbf{Parameters} \\
\midrule
UNI2-h & $350{,}000^\dagger$ & $681{,}394{,}176$ \\
\midrule
HoVerNet & $455^\ddagger$ & $37{,}721{,}166$ \\
Embedding model & $665$ & $11{,}680{,}936$ \\
Preprocessing total & 1,110 & $50{,}406{,}010$ \\
\bottomrule
\end{tabular}
\caption{Overview of data usage and model parameters in the preprocessing pipelines of UNI-2h and our method.}
\end{table}

\noindent
$^\dagger$ exact number is not reported, model card just mentions "350,000+"\newline
$^\ddagger$ counting each visual field as unique WSI (upper bound)


\section{Hyperparameter details}
\label{app:hyperparameter}
Here we list all hyperparameters used for model training.

\begin{table}[!h]
    \centering
    \label{tab:hyperparams}
    \begin{tabular}{lc}
        \toprule
        \textbf{Hyperparameter} & \textbf{Value / Range} \\
        \midrule
        \multicolumn{2}{c}{\textit{Fixed Parameters}} \\
        \midrule
        Optimizer & Adam ($\beta_1=0.9, \beta_2=0.999$) \\
        Epochs & 20 \\
        Activation Function & ReLU \\
        MLP Dimension & 384 \\
        Graph Layers & 2 \\
        Graph Embedding Dimension & 512 \\
        \midrule
        \multicolumn{2}{c}{\textit{Sweep Configuration}} \\
        \midrule
        Learning Rate & Log-uniform $[10^{-4}, 10^{-2}]$ \\
        Weight Decay & $\{10^{-3}, 10^{-5}, 10^{-8}\}$ \\
        \bottomrule
    \end{tabular}
    \caption{Hyperparameter settings and search space for model training.}
\end{table}

\section{Statistics of region sizes}
In the table below, we show the distribution of region sizes. We observe that there is some variety in height and width (as expected and desired), but all regions stay in a similar range of sizes.
\begin{table}[!h]
    \centering
    \label{tab:region_stats_raw}
    \small
    \begin{tabular}{lrrrrrr}
        \toprule
        \textbf{Metric} & \textbf{Mean} & \textbf{5\%} & \textbf{25\%} & \textbf{Median} & \textbf{75\%} & \textbf{95\%} \\
        \midrule
        Height & 188 & 96  & 168 & 192 & 216 & 264 \\
        Width  & 194 & 104 & 168 & 192 & 216 & 288 \\
        \bottomrule
    \end{tabular}
    \label{tab:regionsizes}
    \caption{Distribution of region dimensions ($N=142,929$). While the dataset contains regions of varying shapes, the interquartile ranges (25\%\textendash75\%) indicate that the majority of regions fall within a consistent spatial range.}
\end{table}

\section{Additional ablation studies}
This section presents additional ablation studies evaluating the impact of the correlation threshold $\xi$ and group similarity $\tau$ on TCGA-UCEC validation performance. Compared with the TCGA-BRCA results, we observe greater robustness across both hyperparameters. Consistent with previous findings, mild feature pruning improves performance; however, unlike in TCGA-BRCA, larger regions (corresponding to lower $\tau$ values) yield better performance.
\input{results/ablation_ucec}


\section{Distribution of important features}
We analyzed the distribution of the most important features, according to the attributions scores of Integrated Gradients. This highlighted the great value of nuclear features, as they are very prominent among the top-5 most attributed features.


\begin{figure*}[ht]
\includegraphics[width=0.6\linewidth]{figures/importantfeatures.png}
\centering
\caption{Distribution of features present in the top-5 most attributed features for TCGA-BRCA stage prediction.}
\label{fig:importantfeatures}
\end{figure*}


\newpage
\section{Full feature overview}
\label{app:features}
Below is a complete list of all features and whether they are included for different correlation thresholds $\tau$.
\input{sections/feature_table}