
\looseness=-1\cite{waudby2024estimating} offer an in-depth study on detecting deviations in the means of bounded quantities for stream settings, providing useful tools (\eg~in terms of betting rate design) for the testing of risks following \autoref{eq:def-risk}. Very recently, \cite{fan2025testing} explore some settings for additional bounds on the variance. Whereas they consider data to originate from a fixed source distribution $P$ with unknown mean $\mu$ to be estimated, our observations originate from variable, time-dependent distributions $P_t$, and we simultaneously monitor multiple means (corresponding to the threshold-dependent risks $\gR_t(\psi)$). Thus our underlying hypothesis dictating the test design is subtly, but distinctly different. Yet, strong results on the universal representation of test martingales (\eg~stated by \cite{waudby2024estimating}, Prop. 3) render the process structure in \autoref{eq:test-supermartingale} useful for a wide range of testing problems. We attempt to intuitively motivate this via a `forecasting game' in \autoref{sec:method}. 

\paragraph{Sequential testing and risk monitoring.} \looseness=-1 \cite{xu2024active, adaptiveltt} leverage above results on deviations in means to provide strong time-uniform risk control for \emph{i.i.d} streams, discussed further in \autoref{subsec:connection-methods}. Closely related to our work, \cite{podkopaev2021tracking} monitor a \emph{running risk} of the form $\gR_r(\psi) = \frac{1}{t}\sum_{i=1}^{t} \mathbb{E}_{P_i}[\rz_i]$ under the sequential testing framework, with similar false alarm guarantees. However, we consider the more challenging instantaneous true risk $\gR_t(\psi)$ at any given time step, which can recover $\gR_r(\psi)$ but not vice versa. Furthermore, their experimental design tends to distinguish between benign and harmful shifts caused by a dominant shift initiated at $P_0$ (akin to changepoint detection), whereas we incorporate a broader variety of shifts. Finally, we do not impose sample independence. Their approach was reformulated by \cite{amoukou2024sequential} for unlabelled streams, and relatedly \cite{bar2024protected} suggest an unsupervised covariate shift detector on the basis of entropy-matching. Particular to out-of-distribution detection, \cite{vishwakarma2024taming, sun2024online} also leverage martingale-based constructions.

\paragraph{Other sequential testing under shift.} \looseness=-1 Stream data denotes a particular test setting under the `testing-by-betting' framework, lending itself naturally to the use of test martingales\footnote{Also referred to therein as \emph{sequential anytime-valid inference}.} \citep{ramdas2023game, ramdas2024hypothesis}. Within that framework, previous work has considered a variety of testing problems, such as on exchangeability \citep{vovk2021testing, saha2024testing}, independence \citep{podkopaev2023sequential}, two-sample testing \citep{Shekhar2021NonparametricTT, PandevaBNF24, PandevaFRS24, luo2024online}, or changepoint detection \citep{shekhar2023sequential, shekhar2024reducing, vovk2021retrain, volkhonskiy2017inductive, shin2023detectors}. Theses works differ from ours in terms of the hypotheses they address, their data settings (\eg~by simultaneously observing two separate data streams), or experimental designs (\eg~detecting a single changepoint or shift). Additional related works on static risk control and extensions to stream settings not using sequential testing can be found in \autoref{app:background}.
