\section{Example of Instances}
\label{appendix:examples}
\input{tables/app-instance-example.tex}
Table \ref{tab:instanceexample}  shows examples of instances from each dataset.

\section{GPU hours per model / dataset}
In Table \ref{tab:TrainingTime}, we report the approximate training time in GPU hours for each model and dataset, averaged over splits.

\input{tables/training_time.tex}

\section{GenBench Evaluation Card}

To concisely describe our experiments, we add a GenBench evaluation card \citep{hupkes2022sotageneralisation} in Table~\ref{tab:eval_card}.

% \begin{figure}
%     \centering
%     \input{eval_card}
%     \caption{A GenBench evaluation card \citep{hupkes2022sotageneralisation} that summarises our experiments. We generated the eval card with the tool that can be found on the GenBench website: \url{https://genbench.org/eval_cards/}. $\square$ = experiments with SCAN; \textcolor{red}{Finish card and description!}}
%     \label{fig:eval_card}
% \end{figure}

\input{tables/genbench_eval_card}