\section{Method}
In this work, we aim to find parameter-efficient NFs for $N$ signals 
$\big\{s_{1}, \ldots, s_{N} : \mathbb R^C \to \mathbb R^D\big\}$, e.g., a set of time series, or images, by learning a functional representation of the signal $s_i$ given some context set $\mathcal{C}^{(i)}:=\big\{(\mathbf{x}_j,\mathbf{y}_j)\big\}_{j=1}^{M}$ with $M$ coordinate-value pairs $(\mathbf{x}_j,\mathbf{y}_j) \in \mathbb{R}^C \times \mathbb{R}^D$. While parameterizing such a function as a neural network $f_{\theta}:\mathbb{R}^C\to\mathbb{R}^D$, with all parameters $\theta$ being optimized to fit a single signal is widely explored~\cite{sitzmann2020implicit,mildenhall2021nerf,saragadam2023wire}, this approach is prohibitively expensive when scaled to large datasets.
%
%
%
% ---- Network Design ----
\paragraph{Network Architecture}
\label{subsec:model}
We argue that most sets of signals (datasets) contain large amounts of redundant information or structure that we can learn over the entire set. This is particularly true in medicine, where patients exhibit broadly similar yet slightly varying anatomies. We therefore define a neural network $f_{\theta,\phi^{(i)}}:\mathbb{R}^C\rightarrow\mathbb{R}^D$ with shared network parameters $\theta$ that represent this redundant information and additional signal-specific parameters $\phi^{(i)}\in\mathbb{R}^{P}$ that condition the base network to represent a specific signal $s_i$. We apply a $K$-layer MLP architecture with a hidden dimension of $L$ and FiLM modulated SIREN activations \cite{sitzmann2020implicit,mehta2021modulated}, where all layers $k\in\{2, ..., K-1\}$ are defined as:
\begin{equation}
    x \mapsto \sin\Big(\omega_{k}\big(\underbrace{\mathbf{W}_kx+\mathbf{b}_k}_{\text{Linear}}+\underbrace{\mathbf{m}_k(\phi^{(i)})}_{\text{Modulation}}\big)\Big),
\end{equation}
with $\omega_k$ being the layer's frequency parameter, $\mathbf{W}_k$ and $\mathbf{b}_k$ being the weights and biases of the $k$-th layer, and $\mathbf{m}_k(\cdot)$ being a linear layer that maps the signal-specific parameters $\phi^{(i)}$ to a shift-modulation vector that is added in the base network's nonlinearity \cite{perez2018film}. The first layer is a SIREN layer that projects the input coordinate to a higher-dimensional space. The last layer is a linear layer that performs a simple mapping to the desired output dimension. An overview of the proposed architecture is shown in \autoref{fig:network}.
%
%
%
% ---- Network Initialization ----
\paragraph{Network Initialization}
\label{subsec:init}
A proper initialization of NFs has been shown to have a huge influence on convergence and overall performance of the applied networks \cite{kaniafresh2025,yeomfast2025}. We therefore initialize the network's weights and biases similar to \citet{sitzmann2020implicit}: 
\begin{equation}
    \mathbf{W}_k, \mathbf{b}_k \sim \mathcal{U}\left(-\frac{\sqrt{6/n}}{\omega_{k}},\frac{\sqrt{6/n}}{\omega_{k}}\right),
\end{equation}
with $n$ being the layer's input dimension. The first layer's weights and biases are initialized as ${\textbf{W}_1,\textbf{b}_1\sim\mathcal{U}(-1/n, 1/n)}$.
%
%
%
% ---- Omega-Schedule ----
\paragraph{Introducing an \texorpdfstring{$\boldsymbol{\omega}$}{Omega}-Schedule}
\label{subsubsec:schedule}
While recent research treats $\omega$ as a single hyperparameter that remains constant over all network layers \cite{sitzmann2020implicit,dupont2022data}, we identify this as a main restriction when being applied in a generalization setting. We therefore propose to apply an \textbf{$\boldsymbol{\omega}$-schedule} that linearly increases from $\omega_{1}$ to $\omega_{K}$ and find that this is equivalent to a \textit{layer-wise learning rate schedule} that positively influences the network's learning dynamics.
By carefully analyzing the interplay between a layer's $\omega$-parameter and it's learning rate $\tau$ (detailed derivation in \autoref{sec:omega_analysis}), we find that two layers with indices $m$ and $n$ and different $\omega$-values $\omega_{m} \neq \omega_{n}$ exhibit the following relation:
\begin{equation}
\label{eq:lr_vs_omega}
    \frac{\tau_n}{\tau_m} = \left(\frac{\omega_{m}}{\omega_{n}}\right)^{2}.
\end{equation} 
This means that both layers show the same behavior, if we rescale the learning rates according to the inverse quadratic relationship $\tau \propto \frac{1}{\omega^2}$. This observation offers a so far overlooked perspective on the $\omega$-parameter in SIREN networks and establishes a connection to recent research on learning dynamics, providing a theoretical justification for introducing the proposed $\omega$-schedule. \citet{chen2023which} showed that shallow MLP layers yield faster convergence due to more informative gradients and a smoother loss landscape. They formalize this as the \emph{layer convergence bias}, arguing that training strategies that prioritize low-frequency representations in shallow layers, while deferring high-frequency details to deeper layers, achieve better performance. They further find that shallow layers tolerate higher learning rates, whereas deeper layers begin to effectively learn once the learning rate decays. Our perspective on the $\omega$-parameter in SIRENs naturally fits into this framework. By gradually increasing $\omega$ with depth, we effectively lower the learning rate of deeper layers. This enforces a staged optimization dynamic, where shallow layers first stabilize around smooth, low-frequency features, and deeper layers subsequently refine high-frequency details.
We validate this theoretical insight through ablation studies (\autoref{subsec:ablations}), demonstrating that networks incorporating our proposed $\omega$-schedule outperform current state-of-the-art networks with a constant $\omega$-parameter.
%
%
%
% ---- Meta-Learning Shared Network Parameters ----
\begin{figure}[htbp]
\floatconts
  {fig:meta_learn}
  {\caption{\textit{(Left)} The proposed approach for meta-learning the shared model parameters $\theta$. An Algorithm describing the full meta-learning approach can be found in \autoref{supp:secord}. \textit{(Right)} The proposed test time adaptation scheme.}}
  {\includegraphics[width=\linewidth]{images/LearningFitting.pdf}}
\end{figure}
\paragraph{Efficient Meta-Learning with Context Reduction}
\label{subsec:metalearning}
To efficiently create a set of NFs, we aim to meta-learn the shared parameters $\theta$ such that we can fit a signal $s_i$ by only optimizing $\phi^{(i)}$ for \emph{very few} update steps (see \autoref{fig:network}). We follow a CAVIA approach \cite{zintgraf2019fast}, shown in \autoref{fig:meta_learn}, by defining an optimization process over the shared model parameters:
\begin{equation}
\label{eq:metaobjective}
    \theta^{*} = \argmin_{\theta}\frac{1}{N}\sum_{i=1}^{N}\mathcal{L}_{\mathtt{MSE}}(\phi_G^{(i)},\theta;\mathcal{C}^{(i)}),
\end{equation}
where in each \emph{meta/outer-loop} update step, the \emph{inner-loop} optimizes $\phi^{(i)}$ from scratch ($\phi_0^{(i)} := \mathbf{0}$), performing $G$ update steps $\phi_{g+1}^{(i)} := \phi_g^{(i)}- \alpha \nabla_\phi \mathcal L_{\mathtt{MSE}}(\phi_g^{(i)}, \theta, \mathcal{C}^{(i)})$, using stochastic gradient descent (SGD) with a fixed learning rate $\alpha$. The meta-update is performed using AdamW \cite{loshchilov2019decoupled} with a learning rate $\beta$ that follows a cosine annealing learning rate schedule \cite{loshchilov2016sgdr}.
All optimization steps aim to minimize the reconstruction error when evaluating the learned function $f_{\theta, \phi^{(i)}_{g}}$ on a given context set $\mathcal{C}^{(i)}$, by minimizing the mean squared-error (MSE) loss:
\begin{equation}
\label{eq:mse}
    \mathcal{L}_{\mathtt{MSE}}(\phi^{(i)}_{g}, \theta; \mathcal{C}^{(i)}):=\frac{1}{\vert \mathcal{C}^{(i)} \vert }\sum_{j \in \mathcal{C}^{(i)}} \| f_{\theta, \phi^{(i)}_{g}}(\mathbf{x}_j) - \mathbf{y}_j\|_{2}^{2}.
\end{equation}
Performing a single meta-update step involves backpropagating through the entire inner-loop optimization, which requires retaining the computational graph in GPU memory to compute second-order gradients \cite{finn2017model}.\footnote{Updating $\theta$ necessitates backpropagation through all inner-loop parameters $\phi_{1:G}$, each of which is itself a function of $\theta$. Consequently, computing the update for $\theta$ involves Hessian-vector products, which in turn demand storing the complete inner-loop computational graph. More information can be found in \autoref{supp:secord}.} This resource-intensive task does not scale well to high-dimensional signals. While first-order approximations \cite{finn2017model,nichol2018first} or auto-decoder training approaches that do not rely on second-order optimization exist \cite{park2019deepsdf}, recent research has shown that this results in severe performance drops or unstable training \cite{dupont2022data,dupont2022coin++}. To overcome this limitation, we propose to make use of a \textbf{reduced context set} $\mathcal{C}_{\mathtt{red}}^{(i)}$ during the inner-loop optimization \cite{tack2023learning}. This reduced context set contains a subset of the full context set $\mathcal{C}_{\mathtt{red}}^{(i)}\leq\mathcal{C}^{(i)}$, thus saving GPU memory that is required for second-order optimization. We obtain the reduced context set by randomly sampling $\gamma|\mathcal{C}^{(i)}|$ coordinate-value pairs from $\mathcal{C}^{(i)}$. We empirically find that reducing the selection ratio $\gamma$ results in marginal performance drops, while significantly reducing the required GPU memory and speeding up the training (see \autoref{tab:ablation_selectionratio}).
%
%
%
% ---- Fitting Neural Fields to Signals ----
\paragraph{Fitting Neural Fields at Test Time}
\label{subsec:fitNF}
Given the meta-learned model parameters $\theta^{*}$, we fit a NF to each signal $s_1, ..., s_N$, by optimizing the signal-specific parameter vectors $\phi^{(1)}, ..., \phi^{(N)}$. We start with initializing a signal-specific parameter vector $\phi^{(i)}:=\mathbf{0}$ and optimize $\phi^{(i)}$ for $H$ steps by minimizing $\mathcal{L}_{\mathtt{MSE}}(\phi^{(i)}, \theta^{*}; \mathcal{C}^{(i)})$. We do this for all $N$ signals. As no second-order optimization is required at test time (see \autoref{fig:meta_learn}), we can make use of the full context set $\mathcal{C}^{(i)}$, i.e., we use all the available information at test time. A set of NFs representing the signals $s_1, ..., s_N$ is therefore defined by the network architecture, the shared model parameters $\theta^{*}$, and the signal-specific parameters $\phi^{(1)}, ..., \phi^{(N)}$.
While meta-learning $\theta^{*}$ requires solving a complex optimization problem, fitting a NF at test time (i.e., optimizing $\phi^{(i)}$) simply requires $H$ SGD updates, which results in fast and low-resource inference ($<\SI{0.5}{\second}$ and $<\SI{1}{\giga\byte}$ GPU memory for a $64\times64$ image), a desirable property in medical applications.