\section{Discussion}\label{sec:discussion}
The empirical results (\S\ref{sec:eval}) present two key findings. First, there is value to augmenting model predictive control (MPC) with a causal sensitivity analysis even in realistic settings. Second, the proposed sensitivity model (\S\ref{sec:sensitivity}) that leverages a norm in the action-trajectory space is more helpful to MPC than more classical sensitivity models derived from the marginal sensitivity model (MSM)~\citep{tan}. We also instantiate the partially identified MPC algorithm in the form of an augmented model predictive path integral (MPPI). MPPI is employed in several deep model-based RL algorithms that achieve the state of the art~\cite[e.g.][]{hansen2024tdmpc}.

The theoretical results (\S\ref{sec:theory}) motivate our sensitivity model in the context of recent developments in causal inference, and show the flexibility of the potential-outcomes notation. Our analysis (\S\ref{sec:partial-identification}) reveals that the sharp partial identification is relatively simple, computationally tractable, and leads to minimax controllers (\S\ref{sec:control}), in the sense that it finds the best policy for the worst-case scenario~\citep{kallus2021minimax}.

Notably, unlike previous work on sensitivity analysis for off-policy evaluation and learning, our approach \emph{does not require the hidden confounders to be memoryless or static}~\citep{kausik2024offline}. Instead, it allows the domain expert to select a norm that suits the action-trajectory space on which they wish to design an MPC algorithm. We look forward to further studying how to select these norms for different processes.

\subsection{Future Work}
A number of distinct avenues exist for extension of the current work. 
We consider two main threads in generative AI and online calibration. 

\paragraph{Generative AI.} 
The proposed methodology is general enough to suit various modalities, including text through large language models (LLMs). 
It appears that fine-tuning LLMs for specific tasks often reduces the diversity of their generations~\citep{mohammadi2024creativity,kirk2024understanding}.
For this and other practical reasons, it may be more useful to use an LLM foundation model in combination with our sensitivity analysis for agents to solve tasks that are novel to the LLM.
Simple algorithms in the spirit of MPC already find success in guiding LLMs~\citep{beirami2024theoretical}.

\paragraph{Online calibration.}
This paper considers the problem of partial identification and minimax control under a general class of sensitivity models. While the empirical evaluations show the utility of a calibrated sensitivity model, they do not show \emph{how} to calibrate it online (its $\Gamma$ parameter, or its choice of norm). There are numerous established solutions including bandits for online calibration. %
The sensitivity model's parameters are extremely low-dimensional and should therefore be easy to learn online, and much more data-efficient than wholesale online reinforcement learning. 

\subsection{Limitations}
While we expand the existing theory on causal sensitivity analysis to make partial identification more data-adaptive, especially in the novel action-trajectory setting, there are still fundamental limitations to the family of sensitivity models related to the MSM~\citep{huang2025variance}.
Our model shares the shortcomings of a pointwise hard constraint across all counterfactuals, namely that it can be untenable to use the $\Gamma$ that absolutely covers all possible hidden confounders. Practically, there tends to be a $\Gamma$ that is most helpful for achieving positive rewards, and this could be lower than the true $\Gamma$. A natural next step for this line of work is to turn the sensitivity model's constraint into a probabilistic statement, increasing its flexibility---especially at the tails of the conditional outcome distributions. 



\section{Conclusion}
Our causal sensitivity analysis of action trajectories bridges recent developments in causal inference with off-policy learning. Model predictive control is becoming more popular for learning generalizable agents, and our contribution on dealing with partial observability is a promising step towards making them more reliable in the real world.
