
\paragraph{Static risk control and extensions.} \looseness=-1Perhaps most prominently, the framework of \emph{conformal prediction} constructs set predictors with upper bounds specifically on the miscoverage risk under \emph{i.i.d.} or exchangeable data, with a substantial recent body of literature (see, \eg, \cite{angelopoulos2023conformal, fontana2023conformal}). Extensions to more general bounded risks include \cite{angelopoulos2024crc, bates2021distribution, angelopoulos2021learn} leveraging different concentration results, whereas \cite{angelopoulos2023prediction} explore the use of unlabelled calibration data. Recent work on conformal prediction also addresses shifted or non-exchangeable data sequences by tracking and updating the tolerated miscoverage rate \citep{Gibbs2021AdaptiveCI, angelopoulos2024online, zaffran2022adaptive} or different weighting schemes \citep{barber2023conformal, guan2023localized}, and settings include covariate shift \citep{tibshirani2019conformal}, label shift \citep{podkopaev2021distribution} and their abstraction to a more general shift \citep{prinster2024conformal}. \cite{Feldman2022AchievingRC} explore shift settings for more general bounded risks, and we elaborate on this connection in \autoref{subsec:connection-methods}.

\paragraph{Sequential hypothesis testing and risk tracking.}\looseness=-1 \cite{waudby2024estimating} offer an in-depth study on detecting deviations in the means of bounded quantities for stream settings, providing fundamental tools for the testing of risks following \autoref{eq:def-risk}. Very recently, \cite{fan2025testing} explore some settings for additional bounds on the variance. \cite{xu2024active, adaptiveltt} leverage deviations in means to provide strong time-uniform risk control for \emph{i.i.d} streams, discussed further in \autoref{subsec:connection-methods}. Closely related to our work, \cite{podkopaev2021tracking} monitor a \emph{running risk} of the form $\gR_r(\psi) = \frac{1}{t}\sum_{i=1}^{t} \mathbb{E}_{P_i}[\rvz_i]$ under the sequential testing framework, with similar false alarm guarantees. However, we consider the more challenging instantaneous risk $\gR_t(\psi)$ at any given time step, which can recover $\gR_r(\psi)$ but not vice versa. Furthermore, their experimental design tends to distinguish between benign and harmful shifts caused by a dominant shift initiated at $P_0$ (akin to changepoint detection), whereas we incorporate a broader variety of shifts. Their approach was reformulated by \cite{amoukou2024sequential} for unlabelled streams, and relatedly \cite{bar2024protected} suggest an unsupervised covariate shift detector on the basis of entropy-matching to a reference set. Other related work includes \cite{weinstein2020online} for tracking of risks associated with population parameters (rather than predictive quantities), and \cite{vishwakarma2024taming, sun2024online} who leverage martingale-based constructions particular to out-of-distribution detection.

\paragraph{Sequential hypothesis testing under shift.} \looseness=-1Stream data denotes a particular test setting under the `testing-by-betting' framework, lending itself naturally to the use of test martingales\footnote{It is also referred to as \emph{sequential anytime-valid inference}.} \citep{ramdas2023game, ramdas2024hypothesis}. Within that framework, previous work has considered a variety of testing problems, such as on exchangeability \citep{vovk2021testing, saha2024testing}, independence \citep{podkopaev2023sequential}, two-sample testing \citep{Shekhar2021NonparametricTT, PandevaBNF24, PandevaFRS24, luo2024online}, or changepoint detection \citep{shekhar2023sequential, shekhar2024reducing, vovk2021retrain, volkhonskiy2017inductive, shin2023detectors}. Theses works differ from ours in terms of the hypotheses they address, their data settings (\eg~by simultaneously observing two separate data streams), or experimental designs (\eg~detecting a single changepoint or shift).