\subsection{Sensitivity Analysis}\label{sec:sensitivity}


\begin{figure}
    \centering
    \textsf{Continuity Assumption}
    \includegraphics[width=\linewidth]{figures/potential-outcomes.pdf}
    \caption{Continuity in the outcome counterfactual probabilities is set in terms of a norm on realized action trajectories.}
    \label{fig:sensitivity}
\end{figure}




We begin our sensitivity analysis by considering the counterfactual distribution $P_{Y(\pi)|\Pi,X}$, where generally $\Pi$ is called the exposure to disambiguate from $\pi$, the intervention.

Partial identification will be enabled by a mild continuity argument. The argument aligns with the reasoning that led to the classically celebrated marginal sensitivity model (MSM) for binary exposures~\citep{tan}. The MSM constrains the Radon-Nikodym derivative of the two possible (binary) counterfactuals. In order to graduate to continuous exposure domains, \citet{marmarelis23} recently proposed local bounds for nearby counterfactuals. Concretely, their $\delta$MSM is derived by assuming a constraint on
\begin{equation}\label{eq:infinitesimal-sensitivity}
    \dv{P_{Y(\pi)|\Pi=\alpha+\delta,X=x}}{P_{Y(\pi)|\Pi=\alpha,X=x}} \approx 1,
\end{equation}
for a sufficiently small value $\delta$ in the space of exposures, adapted to this paper's notation. Our analysis is inspired by a vector-valued extension of the $\delta$MSM applied to the action trajectories. By Equation~\eqref{eq:infinitesimal-sensitivity}, a form of continuity is placed on the counterfactual densities with respect to the observed trajectory $\Pi$ at any feasible value $\alpha$, for any potential trajectory $\pi$, and history $x$. In other words, it is describing how much the distribution of rewards $Y$ for a \emph{potential} trajectory $\pi$, denoted as $Y(\pi)$, could change with a perturbation in the \emph{realized} action trajectory $\alpha$. Any statistical dependence between $Y(\pi)$ and $\Pi$, conditioned on $X$, could only occur through hidden confounders that violate ignorability. %


In a POMDP setting, the hidden state behaves as a hidden confounder whenever it affects the reference policy---which generates the offline realized trajectories---and the reward. It impedes identification of the reward for potential trajectories that are off-reference-policy, as in online planning. %
So far, the literature on causal sensitivity models has failed to provide an approach to partial identification that is \emph{generally applicable} while also \emph{adapting its bounds} based on how off-policy the counterfactuals in question really are.

A recently popular sensitivity model that can be used for vector valued exposures, and therefore action trajectories, is termed the CMSM~\citep{frauen2024sharp,jesson22}. While simple and surprisingly effective, the CMSM does not have a way to quantify whether some trajectories are more on-policy or off-policy than others, so it does not discriminate in its resultant bounds. On the other hand, the $\delta$MSM may provide a starting point for an adaptive sensitivity model because it considers continuity between nearby counterfactuals. We deviate from the original infinitesimal formulation of the $\delta$MSM and consider exposures and interventions in a general normed vector space of action trajectories.


First we re-frame the arguments $(\alpha,\delta)$ by setting $\alpha=\pi$ and $\delta=\Pi-\pi$, so that $P_{Y(\pi)|\Pi=\alpha,X=x}$ becomes the identifiable quantity
$P_{Y(\pi)|\Pi=\pi,X=x}=P_{Y|\Pi=\pi,X=x}.$
Equation~\eqref{eq:infinitesimal-sensitivity} transforms to
\begin{equation}\label{eq:new-sensitivity}
    \dv{P_{Y(\pi)|\Pi,X=x}}{P_{Y|\Pi=\pi,X=x}} \approx 1.
\end{equation}
This constraint is to hold almost everywhere in the joint probability space of $\big(Y(\pi),Y,\Pi\big)$, and for any $(\pi, x)$ with support in $P_{\Pi,X}$. It is instructive to think of $(\pi,x)$ as fixed and $\big(Y(\pi),Y,\Pi\big)$ as a triplet of random variables. The framing corresponds to the decision-making context, where for a given ``state'' $x$, we seek to evaluate possible interventions $\pi$. The existence of this Radon-Nikodym derivative can be guaranteed under the mild condition that all counterfactuals have identical support (in the outcome space $\mathcal{Y}$, shared by all potential and realized outcomes)~\citep{kallenberg}.



Suppose that a norm is defined over trajectories. If the counterfactual log-probability density functions could be assumed to be continuous in the realized trajectory $\Pi$, then the Radon-Nikodym derivative of Equation~\eqref{eq:new-sensitivity} could be constrained via Lipschitz continuity in $\norm{\Pi-\pi}$ as illustrated in Figure~\ref{fig:sensitivity}.





\begin{tcolorbox}
\begin{definition}[Sensitivity Model]\label{def:sensitivity-model}
    Let $\Gamma\geq 1$ be the lowest constant such that
    \begin{equation*}\label{eq:sensitivity-model}
        \abs{\log\dv{P_{Y(\pi)|\Pi,X}(Y\mid\Pi,X)}{P_{Y|\Pi,X}(Y\mid\pi,X)}} \leq \norm{\Pi-\pi}\log\Gamma
    \end{equation*} %
    almost everywhere, and for any action trajectory $\pi$. The scalar $\Gamma$ is the sensitivity parameter for this model.
\end{definition}
\end{tcolorbox}

