
\looseness=-1We investigate sequential testing-based approaches to monitor an unobservable, time-dependent bounded risk $\gR_t(\psi)$ in a dynamic data stream setting, challenged by unknown and repeated distribution shifts. Motivated by the `testing by betting' framework \citep{waudby2024estimating}, our martingale-based monitoring process ensures timely detection of risk violations whilst providing finite-sample control over the false alarm rate. This renders a statistically rigorous and yet practical procedure for risk monitoring.

\looseness=-1However, we are inherently limited in our safety assurances by the unpredictability of any occuring shift, and the minimal assumptions we impose on it. More informative forward-looking assurances can potentially be obtained if additional restrictions are considered, such as constraints on the shift origin or its intensity and growth rate. Similarly, rephrasing our problem statement in terms of a different hypothesis might simplify the task and offer more efficient or \emph{unsupervised} monitoring, \eg~by drawing inspiration from \cite{bar2024protected}'s entropy-matching idea or \cite{amoukou2024sequential}'s label-free quantile test. Possible unsupervised extensions may include recasting the task as two-sample testing \citep{PandevaBNF24,PandevaFRS24}, using generalization estimation \citep{baek2022agreement, rosenfeld2023almost}, or leveraging calibration properties \citep{gupta2020distribution}. Similarly, the model update step can be integrated into the framework, \eg~via test-time adapation \citep{schirmer2024test} or online and continual learning principles \citep{wang2024comprehensive}. Ultimately, the provision of practical safety assurances for robust model behaviour at deployment \emph{under arbitrary shift} is a challenging problem \citep{fang2022out}.