\section{Experiments}
%
%
%
% ---- Datasets ----
\paragraph{Datasets}
\label{subsec:dataset}
We conduct experiments on a diverse set of publicly available datasets, spanning medical signals of different modalities, to demonstrate the flexibility of our proposed method: a single-lead \texttt{ECG} dataset \cite{kachuee2018ecg}, a \texttt{Chest X-ray} dataset~\cite{wang2017chestx}, a \texttt{Pneumonia Chest X-ray} dataset~\cite{kermany2018identifying}, a \texttt{Retinal OCT} dataset~\cite{kermany2018identifying}, a \texttt{Fundus Camera} dataset~\cite{liu2022deepdrid}, a \texttt{Dermatoscope} image dataset~\cite{codella2019skin,tschandl2018ham10000}, a \texttt{Colon Histopathology} dataset~\cite{kather2019predicting}, a \texttt{Cell Microscopy} dataset~\cite{ljosa2012annotated}, a \texttt{Brain MRI} dataset~\cite{baid2021rsna,bakas2017advancing,menze2014multimodal}, and a \texttt{Lung CT} dataset~\cite{armato2011lung}. No preprocessing was needed for \texttt{ECG}. For all 2D datasets, we use preprocessed versions from MedMNIST \cite{medmnistv1,medmnistv2}. The \texttt{Brain MRI} and \texttt{Lung CT} datasets were preprocessed as described by \citet{friedrich2024wdm}.
%
%
%
% ---- Implementation Details ----
\paragraph{Implementation Details}
\label{subsec:implementation}
All networks were trained with $G=10$ inner-loop and $H=20$ test time adaptation steps. The inner-loop learning rate was set to $\alpha=10^{-2}$ and the outer-loop learning rate to $\beta=3\times 10^{-6}$. Network configurations and further training details are reported in \autoref{app:implementation}. All experiments were carried out on a single NVIDIA A100 (\SI{40}{\giga\byte}) GPU. Our implementation is publicly available at \url{https://github.com/pfriedri/medfuncta}. 
%
%
%
% ---- Reconstruction Quality ----
\paragraph{Reconstruction Quality}
\label{subsec:reconexp}
We first validate that our proposed approach can fit a wide range of medical signals by performing reconstruction experiments. We meta-learn the shared network parameters $\theta$ on a training set and evaluate the reconstruction quality on a hold-out test set. All models are trained for a fixed number of $\SI{250}{k}$ iterations, and testing is performed using the weights that achieve the best validation scores.
\begin{table}[htbp]
\floatconts
    {tab:reconstruction}
    {\caption{Mean reconstruction quality of our proposed method, evaluated on a hold-out test set after meta-learning for $\SI{250}{k}$ iterations. MSE scores are multiplied by $10^{3}$. The spatial dimensions are 1D: $187$, 2D: $64\times 64$, 3D: $32 \times 32\times 32$.}}
    {
        \resizebox{0.6\textwidth}{!}{
            \begin{tabular}{ll|cccc}
            \toprule
            Dim. & Dataset & MSE $(\downarrow)$ & PSNR $(\uparrow)$ & SSIM $(\uparrow)$ & LPIPS $(\downarrow)$\\
            \midrule
            1D & \texttt{ECG} & $0.086$ & $43.301$ & $0.964$ & -- \\
            \midrule
            \multirow{7}{*}{2D} & \texttt{Chest X-ray} & $0.097$ & $40.719$ & $0.985$ & $0.013$\\
            & \texttt{Pneumonia Chest X-ray} & $0.146$ & $39.301$ & $0.977$ & $0.014$ \\
            & \texttt{Retinal OCT} & $0.203$ & $37.321$ & $0.934$ & $0.071$\\
            & \texttt{Fundus Camera} & $0.054$ & $43.151$ & $0.978$ & $0.006$\\
            & \texttt{Dermatoscope} & $0.133$ & $40.273$ & $0.962$ & $0.023$\\
            & \texttt{Colon Histopathology} & $0.943$ & $31.886$ & $0.925$ & $0.021$\\
            & \texttt{Cell Microscopy} & $0.013$ & $49.944$ & $0.994$ & $0.008$ \\
            \midrule
            \multirow{2}{*}{3D} & \texttt{Brain MRI} & $0.130$ & $39.191$ & $0.993$ & -- \\
            & \texttt{Lung CT} & $1.561$ & $28.325$ & $0.913$ & --\\
            \bottomrule
            \end{tabular}
        }
    }
\end{table}
We measure mean squared error (MSE), peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and learned perceptual image patch similarity (LPIPS) \cite{zhang2018unreasonable} and report the results in \autoref{tab:reconstruction}. Qualitative examples of the performed reconstruction experiments are shown in \autoref{fig:reconstruction}.
\begin{figure}[ht]
    \centering
    \resizebox{0.8\textwidth}{!}{
        \begin{tikzpicture}
            % First row: Input images
            \node[rotate=90] at (-0.125, 3.2)           {\small Input};
            \node[] at (0, 2)    [anchor=south west]  {\includegraphics[height=2cm]{images/chest/input.png}};
            \node[] at (2, 2)    [anchor=south west]  {\includegraphics[height=2cm]{images/pneum/input.png}};
            \node[] at (4, 2)    [anchor=south west]  {\includegraphics[height=2cm]{images/oct/input.png}};
            \node[] at (6, 2)    [anchor=south west]  {\includegraphics[height=2cm]{images/fundus/input.png}};
            \node[] at (8, 2)    [anchor=south west]  {\includegraphics[height=2cm]{images/derma/input.png}};
            \node[] at (10, 2)   [anchor=south west]  {\includegraphics[height=2cm]{images/histo/input.png}};
            \node[] at (12, 2)   [anchor=south west]  {\includegraphics[height=2cm]{images/micro/input.png}};

            % Second row: Reconstruction images
            \node[rotate=90] at (-0.125, 1.2)           {\small Recon.};
            \node[] at (0, 0)    [anchor=south west]  {\includegraphics[height=2cm]{images/chest/recon.png}};
            \node[] at (2, 0)    [anchor=south west]  {\includegraphics[height=2cm]{images/pneum/recon.png}};
            \node[] at (4, 0)    [anchor=south west]  {\includegraphics[height=2cm]{images/oct/recon.png}};
            \node[] at (6, 0)    [anchor=south west]  {\includegraphics[height=2cm]{images/fundus/recon.png}};
            \node[] at (8, 0)    [anchor=south west]  {\includegraphics[height=2cm]{images/derma/recon.png}};
            \node[] at (10, 0)   [anchor=south west]  {\includegraphics[height=2cm]{images/histo/recon.png}};
            \node[] at (12, 0)   [anchor=south west]  {\includegraphics[height=2cm]{images/micro/recon.png}};
        \end{tikzpicture}
        }
    \caption{Input and reconstruction examples from the hold-out test set for \textit{(from left to right)} \texttt{Chest X-ray}, \texttt{Pneumonia Chest X-ray}, \texttt{Retinal OCT}, \texttt{Fundus Camera}, \texttt{Dermatoscope}, \texttt{Colon Histopathology}, and \texttt{Cell Microscopy} images.}
    \label{fig:reconstruction}
\end{figure}
While performance is generally better on homogeneous datasets, where redundancies can more effectively be exploited, the proposed method also learns to represent complex inhomogeneous datasets, such as the \texttt{Colon Histopathology} dataset. Additional qualitative results can be found in \autoref{sec:additionalqual}.
%
%
%
\paragraph{Scaling MedFuncta to High-Resolution Signals}
\label{subsec:scalingexp}
To highlight our proposed approach's computational efficiency and scalability, we additionally evaluate its performance on higher-resolution signals. We, therefore, perform additional reconstruction experiments over multiple datasets, using images with a resolution of $128 \times 128$ and $224 \times 224$ as supervision signals. Reconstruction scores after \SI{250}{k} meta-learning steps are reported in \autoref{tab:reconstruction_highres}.
\begin{table}[htbp]
\floatconts
  {tab:reconstruction_highres}
  {\caption{Mean reconstruction quality of MedFuncta on higher resolutions. We use the setup from \autoref{app:implementation}, only changing batch size $B$, representation size $P$, selection ratio $\gamma$, $\omega_1=30$, and $\omega_K=300$ . We also report the required training GPU memory in GB. MSE scores are multiplied by $10^{3}$.}}
  {\resizebox{0.6\textwidth}{!}{
  \begin{tabular}{llccc|cccc}
  \toprule
  Dataset & Resolution & $B$ & $P$ & $\gamma$ & MSE $(\downarrow)$ & PSNR $(\uparrow)$ & SSIM $(\uparrow)$ & Mem. $(\downarrow)$\\
  \midrule
  \multirow{2}{*}{\texttt{Chest X-ray}} & $128 \times 128 $ & $8$ & $8192$ & $0.25$ & $0.216$ & $37.174$ & $0.952$ & $28.68$ \\
  & $224 \times 224 $ & $4$ & $16384$ & $0.10$ & $0.401$ & $34.510$ & $0.909$ & $25.39$\\\midrule
  \multirow{2}{*}{\texttt{Dermatoscope}} & $128 \times 128 $ & $8$ & $8192$ & $0.25$ & $0.277$ & $37.072$ & $0.906$ & $28.68$\\
  & $224 \times 224 $ & $4$ & $16384$ & $0.10$ & $0.472$ & $34.752$ & $0.920$ & $25.39$\\
  \bottomrule
  \end{tabular}}}
\end{table}
Qualitative results are shown in \autoref{sec:additionalqual}. The results demonstrate that our proposed method can reconstruct high-resolution signals, even when being trained on a single \SI{40}{\giga\byte} GPU only. We believe that larger networks and longer, distributed training would further improve performance, especially on the $224\times 224$ data, which runs under substantially different conditions due to hardware constraints.
%
%
%
% ---- Classification Experiments ----
\paragraph{Classification Experiments}
To assess whether the learned representation captures relevant information about the underlying signal, we perform classification experiments on the signal-specific parameters $\phi$ \cite{dupont2022data,navon2023equivariant}, using a $k$-Nearest-Neighbor ($k$-NN) classifier, or a 3-layer MLP with ReLU activations and dropout. We compare these simple classifiers on our MedFuncta representation to ResNet50 \cite{he2016deep} and EfficientNet-B0 \cite{tan2019efficientnet} on the original data, and report the number of network parameters, training time, accuracy, and F1 scores. All models were trained for 50 epochs using AdamW with a learning rate of $10^{-3}$. The scores in \autoref{tab:class} show the classification performance on a hold-out test set based on the model parameters yielding the highest validation accuracy.
\begin{table}[htbp]
\floatconts
    {tab:class}
    {\caption{Classification Performance. We report the number of network parameters, the training time in seconds, accuracy, as well as F1 scores.}}
    {
        \resizebox{0.6\columnwidth}{!}{
            \begin{tabular}{ll|cccc}
            \toprule
            Dataset $(Classes)$ & Classifier & Param. & Time $(\downarrow)$ & Acc. $(\uparrow)$ & F1 $(\uparrow)$\\\midrule
            \multirow{5}{*}{\texttt{Pneumonia Chest X-ray} $(2)$} & $k$-NN $(k=1)$ on $\phi$& $0$ & $0$ & $81.57$ & $0.87$  \\
            & $k$-NN $(k=3)$ on $\phi$& $0$ & $0$ & $80.93$ & $0.87$  \\
            & MLP on $\phi$ & $1.2\times 10^6$ & $45$ & $\mathbf{89.10}$ & $\mathbf{0.88}$  \\
            & ResNet50 & $23.5\times10^6$ & $450$ & $83.49$ & $0.80$\\
            & EfficientNet-B0 & $4.0\times10^6$ & $270$ & $84.46$ & $0.82$ \\\midrule
            \multirow{5}{*}{\texttt{Dermatoscope} $(7)$} & $k$-NN $(k=1)$ on $\phi$& $0$ & $0$ & $68.98$ & $0.38$ \\
            & $k$-NN $(k=3)$ on $\phi$& $0$ & $0$ & $69.28$ & $0.32$ \\
            & MLP on $\phi$& $1.2\times 10^6$ & $65$ & $\mathbf{74.96}$ & $0.48$ \\
            & ResNet50 & $23.5\times10^6$ & $700$ & $74.36$ & $\mathbf{0.49}$ \\
            & EfficientNet-B0 & $4.0\times10^6$ & $410$ & $70.42$ & $0.44$ \\
            \bottomrule
            \end{tabular}
        }
    }
\end{table}
We find that solving the two classification tasks (binary and multi-class) on our proposed representation $\phi$ generally works well. We outperform both ResNet50 and EfficientNet-B0, applied to the original images, in terms of accuracy and can demonstrate competitive F1 scores, while requiring less training time and model parameters. These results indicate that our proposed representation actually captures informative features of the underlying signals. The observed performance improvement may be attributable to removing redundant signal components, which are in $\theta$ and not in $\phi$.
%
%
%
% ---- Ablation Studies ----
\paragraph{Ablation Studies}
\label{subsec:ablations}
To validate our proposed \textbf{context reduction strategy}, we study the effect of the context selection ratio $\gamma$ on the reconstruction quality. The results, presented in \autoref{tab:ablation_selectionratio}, demonstrate that reducing the context set in the inner-loop significantly reduces the required GPU memory while resulting in marginal performance drops. We identify a selection ratio of $\gamma=0.25$ as a good trade-off. Compared to using the full context set, we reduce GPU memory usage to $\sim\SI{30}{\%}$ and cut the required training time by more than $\SI{50}{\%}$, while incurring a marginal loss of less than $\SI{1}{dB}$ in PSNR and $0.004$ in SSIM.
\begin{table}[htbp]
\floatconts
  {tab:ablation_selectionratio}
  {\caption{The effect of the context selection ratio $\gamma$ on the reconstruction quality and GPU memory required for training. Measured on \texttt{Chest X-ray} dataset $(64 \times 64)$ after \SI{100}{k} iterations and a batch size of 12, using the baseline configuration. We also report the training time for 100 iterations, averaged over $\SI{50}{k}$ iterations following $\SI{2}{k}$ warm-up iterations.}}
  {\resizebox{0.6\textwidth}{!}{
  \begin{tabular}{p{5cm}|p{1.2cm}p{1.2cm}p{1.2cm}p{1.2cm}p{1.2cm}}
  \toprule
  Selection Ratio $(\gamma)$ & $0.1$ & $0.25$ & $0.5$ & $0.75$ & $1.0$ \\\midrule
  PSNR (dB) & $34.33$ & $35.62$ & $36.22$ & $36.53$ & $36.60$\\
  SSIM & $0.941$ & $0.955$ & $0.958$ & $0.959$ & $0.959$\\
  Memory [GB] & $6.77$ & $11.43$ & $19.32$ & $28.66$ & $34.34$\\
  Time [s] / 100 iterations & $37.38$ & $42.04$ & $68.75$ & $83.18$ & $91.61$ \\
  \bottomrule
  \end{tabular}
  }}
\end{table}
To assess the effectiveness of our proposed \textbf{$\boldsymbol{\omega}$-schedule}, we sweep over a combination of different configurations with $\omega_1=\{10, 20, 30, 40, 50\}$ and $\omega_K:=\delta\omega_1$, with $\delta=\{1, 2, 5,10, 20\}$. The results, shown in \autoref{fig:grid_search}, demonstrate that applying our proposed schedule consistently improves the performance over setups with a single $\omega$-parameter across all layers. We further observe a performance drop for large $\omega_K$-values, which can be attributed to training collapse. Reducing $\omega_K$ has proven effective in mitigating such instability, especially for high-dimensional signals.
\begin{figure}[htbp]
    \floatconts
    {fig:grid_search}
    {\caption{Grid search over different $\omega_1$ and $\delta$ parameters. We report $(a)$ PSNR, $(b)$ SSIM and $(c)$ LPIPS after \SI{25}{k} iterations. Outliers with red borders were excluded from color scaling. Measured on the \texttt{Chest X-ray dataset} $(64 \times 64)$.}}
    {
        \subfigure{
            \label{fig:grid_search_psnr}
            \includegraphics[width=0.3\textwidth]{images/grid_search_psnr.pdf}
        }\,
        \subfigure{
            \label{fig:grid_search_ssim}
            \includegraphics[width=0.305\textwidth]{images/grid_search_ssim.pdf}
        }\,
        \subfigure{
            \label{fig:grid_search_lpips}
            \includegraphics[width=0.3\textwidth]{images/grid_search_lpips.pdf}
        }
    }
\end{figure}
\begin{table}[htbp]
\floatconts
  {tab:ablationothers}
  {\caption{The effect of introducing a global learning rate schedule, our proposed $\omega$-schedule, or a reduced context set $\mathcal{C}_{\mathtt{red}}$ with $\gamma=0.25$, as well as a combination of all of them in comparison to other approaches. Measured on \texttt{Chest X-ray} dataset $(64 \times 64)$ after $\SI{250}{k}$ training iterations. MSE scores are multiplied by $10^{3}$.}}
  {
  \resizebox{0.8\textwidth}{!}{
  \begin{tabular}{llcc|cccc}
  \toprule
  Method & BS & Param. & Memory (GB)& MSE $(\downarrow)$ & PSNR $(\uparrow)$ & SSIM $(\uparrow)$ & LPIPS $(\downarrow)$ \\\midrule
  Functa & $12$ & $17.1\times 10^{6}$ & $25.13$ & $0.403$ & $34.304$ & $0.940$ & $0.059$ \\
  COIN++ & $12$ & $11.5 \times 10^6$ & $15.74$ & $0.379$ & $34.566$ & $0.940$ & $0.056$ \\\midrule
  SpatialFuncta & $12$ & $2.1\times10^6$ & $10.69$ & $0.587$ & $32.782$ & $0.915$ & $0.149$ \\
  COIN++ (patched) & $12$ & $4.5 \times 10^6$ & $4.29$ & $0.153$ & $38.477$ & $0.976$ & $0.014$ \\
  \midrule
  Ours (\textit{Baseline}) & $12$ & $7.7\times 10^{6}$ & $34.34$ & $0.179$ & $37.836$ & $0.970$ & $0.016$\\
  Ours \quad + \textit{Global lr-sched} & $12$ & $7.7\times 10^{6}$ & $34.34$ & $0.168$ & $38.282$ & $0.973$ & $0.017$ \\
  Ours \quad + \textit{$\omega$-Schedule} & $12$ & $7.7\times 10^{6}$ & $34.34$ & $0.119$ & $39.684$ & $0.979$ & $0.013$ \\
  Ours \quad + \textit{$\mathcal{C}_{\mathtt{red}}$}$(\gamma=0.25)$ & $24$ & $7.7\times 10^{6}$ & $21.51$ & $0.205$ & $37.338$ & $0.968$ & $0.019$ \\
  Ours \quad + \textit{All} & $24$ & $7.7\times 10^{6}$ & $21.51$ & $\mathbf{0.097}$ & $\mathbf{40.719}$ & $\mathbf{0.985}$ & $\mathbf{0.013}$\\
  \bottomrule
  \end{tabular}}
  }
\end{table}
As a last experiment, we compare our baseline approach to Functa \cite{dupont2022data}, two versions of COIN++ \cite{dupont2022coin++} (one working on full images and one operating on $32 \times 32$ image patches) and SpatialFuncta \cite{bauer2023spatial}. We additionally study the effect of introducing a global learning rate schedule, our proposed $\omega$-schedule, or the proposed context reduction scheme $\mathcal{C}_{\mathtt{red}}$. Implementation details for all comparing methods can be found in \autoref{app:impl_comp}. The results in \autoref{tab:ablationothers} show that our approach outperforms Functa by approximately $\sim\SI{6.4}{\decibel}$ PSNR and that each proposed component provides a consistent improvement over the baseline (we also consider $\mathcal{C}_{\mathtt{red}}$ as a practical improvement). We also surpass SpatialFuncta and both COIN++ variants, including the patched version that operates on $4\times$ fewer pixels. We believe that incorporating the findings from this paper into patch-based methods could further enhance their quality and scalability. Exploring patch-based representations, however, was beyond the scope of this work. Qualitative comparisons between all methods are provided in \autoref{sec:addcomp}.