['1c1', '< Title: AMORTIZED CONTROL OF CONTINUOUS STATE SPACE FEYNMAN-KAC MODEL FOR IRREGULAR TIME SERIES', '---', '> Title: AMORTIZED STOCHASTIC CONTROL FOR CONTINUOUS-TIME FEYNMAN-KAC MODELS WITH IRREGULAR OBSERVATIONS', '5,10c5,20', '< Section: ', "< conditioned SDEs by approximating the Doob's h-transform. It allow us to propose a tight evidence lower bound (ELBO) for the aforementioned VI algorithm by establishing a fundamental connection between the partial differential equations (PDEs) associated with Doob's h-transform and SOC. The Doob's h-transform often referred to as the twist-function in Sequential Monte Carlo (SMC) literature (Guarniero et al., 2017) to approximate the smoothing distributions. Building on this, (Heng et al., 2020) introduced an algorithm to approximate the twisted transition kernel directly, while a recent concurrent study (Lu & Wang, 2024) extended this approach to continuous-time settings. However, both studies primarily emphasize approximation methods rather than practical applications.", '< In practical situations, the computation of ELBO for a VI algorithm might impractical due to the instability and high memory demands associated with gradient computation of the approximated stochastic dynamics over the entire sequence interval (Liu et al., 2024;Park et al., 2024). To address this issue, we propose two efficient modeling approaches: 1) We establish amortized inference by introducing an auxiliary variable to the latent space, generated by a neural network encoder-decoder. It maps the high-dimensional time-series into a suitable low-dimensional space, allowing more flexible parameterization of the latent dynamics. Moreover, amortization allows the inference of the posterior distribution for a novel time-series sequence without relying on Bayesian recursion by incorporating the learned control function. 2) We leverage the simulation-free property, which enables closed-form sampling from intermediate latent marginal distributions that can be computed in a temporally parallel way. Additionally, we explore a more flexible linear approximation of the drift function in controlled SDEs to enhance the efficiency of the proposed controlled dynamics.', '< We evaluated ACSSM on several time-series tasks across various real-world datasets. Our experiments show that ACSSM consistently outperforms existing baseline models in each tasks, demonstrating its effectiveness in capturing the underlying dynamics of irregular time-series. Additionally, ACSSM achieves significant computational efficiency, enabling faster training times compared to dynamicsbased models that rely on numerical simulations. A summary of the key concepts of ACSSM, along with related works, is provided in Appendix A. We summarize our contributions as follows:', "< • We extend the theory of Doob's h-transform to a multi-marginal cases. This indicates the existence of a class of conditioned SDEs that depend on future observations, where the solutions of these SDEs lead to the true posterior path measure within the framework of CD-SSM. • We reformulate the simulation of conditioned SDEs as a SOC problem to approximating an impractical Doob's h-transform. By leveraging the connection between SOC theory and Doob's h-transform, we propose a variational inference algorithm with a tight ELBO. • For practical real-world applications, we introduce an efficient and scalable modeling approach that enables parallelization of latent dynamic simulation and ELBO computation. • We demonstrate its superior performance across various real-world irregularly sampled timeseries tasks, including per-time point classification, regression, and sequence interpolation and extrapolation, all with computational efficiency.", '< Notation Throughout this paper, we denote path measure by P (•) , defined on the space of continuous functions Ω = C([0, T ], R d ). We sometimes denote with P the expectation as E t,x', '---', '> Section: Introduction', '> Modeling irregular time series data presents a persistent challenge across numerous scientific and engineering disciplines. Such data, characterized by non-uniform sampling intervals and missing observations, often arises in critical applications like healthcare, climate monitoring, and financial markets. Accurately capturing the underlying continuous dynamics from these sparse and asynchronous measurements is crucial for reliable prediction, classification, and inference.', '> ', "> A key challenge lies in approximating the intractable Doob's h-transform and simulating these conditioned dynamics. To address this, we reformulate the problem within the framework of stochastic optimal control (SOC). This allows us to propose a variational inference (VI) algorithm with a tight evidence lower bound (ELBO), by establishing a fundamental connection between the partial differential equations (PDEs) associated with Doob's h-transform and SOC. While the Doob's h-transform is often referred to as the twist-function in Sequential Monte Carlo (SMC) literature (Guarniero et al., 2017) for approximating smoothing distributions, and recent works (Heng et al., 2020; Lu & Wang, 2024) have explored its continuous-time extensions, these studies primarily focus on approximation methods rather than practical applications.", '> ', '> The computation of ELBO for VI algorithms can be impractical due to instability and high memory demands, especially when computing gradients of approximated stochastic dynamics over long sequence intervals (Liu et al., 2024; Park et al., 2024). To mitigate these issues and enhance scalability for real-world applications, ACSSM incorporates two efficient modeling strategies:', '> 1.  **Amortized Inference with Auxiliary Variables**: We introduce an auxiliary variable into the latent space, generated by a neural network encoder-decoder. This maps high-dimensional time series data into a suitable low-dimensional representation, enabling more flexible parameterization of the latent dynamics. Amortization further allows for efficient inference of the posterior distribution for new time-series sequences without relying on computationally intensive Bayesian recursion, by incorporating a learned control function.', '> 2.  **Simulation-Free Latent Dynamics and Parallel Computation**: We leverage a simulation-free property of our chosen dynamics, which enables closed-form sampling from intermediate latent marginal distributions. This allows for temporally parallel computation of latent states and ELBO, significantly improving efficiency. Additionally, we explore a flexible linear approximation of the drift function in controlled SDEs to further enhance the efficiency of the proposed controlled dynamics.', '> ', '> We demonstrate the effectiveness of ACSSM through extensive empirical evaluations on various real-world datasets across several time-series tasks, including classification, regression, interpolation, and extrapolation. Our experiments show that ACSSM consistently outperforms existing baseline models, effectively capturing the underlying dynamics of irregular time series while achieving significant computational efficiency compared to dynamics-based models that rely on numerical simulations. A summary of the key concepts of ACSSM, along with related works, is provided in Appendix A. Our main contributions are summarized as follows:', "> •   We extend the theory of Doob's h-transform to multi-marginal cases, demonstrating the existence of a class of conditioned SDEs that depend on future observations. The solutions of these SDEs lead to the true posterior path measure within the framework of Continuous-Discrete State Space Models (CD-SSM).", "> •   We reformulate the simulation of conditioned SDEs as a Stochastic Optimal Control (SOC) problem to approximate an impractical Doob's h-transform. By leveraging the fundamental connection between SOC theory and Doob's h-transform, we propose a novel variational inference algorithm with a tight ELBO.", '> •   For practical real-world applications, we introduce an efficient and scalable modeling approach that enables parallelization of latent dynamic simulation and ELBO computation.', "> •   We demonstrate ACSSM's superior performance across various real-world irregularly sampled time-series tasks, including per-time point classification, regression, and sequence interpolation and extrapolation, all while maintaining computational efficiency.", '> ', '> Notation: Throughout this paper, we denote a path measure by P (•), defined on the space of continuous functions Ω = C([0, T ], R d ). We sometimes denote with E P the expectation as E t,x', '12c22', '< where the stochastic processes corresponding to P (•) are represented as X (•) and their timemarginal distribution at time t ∈ [0, T ] is given by the push-forward measure µ (•)', '---', '> where the stochastic processes corresponding to P (•) are represented as X (•) and their time-marginal distribution at time t ∈ [0, T ] is given by the push-forward measure µ (•)', '16,19c26,32', '< t (x)dL(x), where L denotes the Lebesgue measure. Additionally, for a function V : [0, T ]×R d → R, we define the first and second derivatives with respect to x ∈ R d as ∇ x V and ∇ xx V, respectively, and the derivative with respect to time t ∈ [0, T ] as ∂ t V. For a sequence of functions {V i } i∈[1:k] , we will denote V i (t, x) := V i,t and [1 : k] = {1, • • • , k}. Finally, the Kullback-Leibler (KL) divergence between two probability measures µ and ν is defined as D KL (µ|ν) = R d log dµ dν (x)dµ(x) when µ is absolutely continuous with respect to ν, and D KL (µ|ν) = +∞ otherwise. continuous-time Markov state trajectory X 0:T in latent space R d is given as a solution of the SDE:', '< (Prior State) dX t = b(t, X t )dt + dW t ,(1)', '< where X 0 ∼ µ 0 and {W t } t∈[0,T ] is a R d -valued Wiener process that is independent of the µ 0 . Since X t is Markov process, the time-evolution of marginal distribution µ t is governed by a transition density, which is the solution to the Fokker-Planck equation assocaited with X t . This allows us to define a path measure P that represent the weak solutions of the SDE in (1) over an interval [0, T ]1 .', '< For a measurement model g i (y ti |X ti ), we consider the case that we have only access to the realization of the (latent) observation process at each discrete-time stamps {t i } i∈[1:k] , i.e., y ti ∼ g i (y ti |X ti ), ∀i ∈ [1 : k]. In this paper, our goal is to infer the classes of SDEs which inducing the filtering/smoothing path measure P ⋆ := P ⋆ (•|H t k ), the conditional distribution over the interval [0, T ] for a given P and a set of observations up to time t k , H t k = {y ti |i ≤ k}:', '---', '> t (x)dL(x), where L denotes the Lebesgue measure. Additionally, for a function V : [0, T ] × R d → R, we define the first and second derivatives with respect to x ∈ R d as ∇ x V and ∇ xx V, respectively, and the derivative with respect to time t ∈ [0, T ] as ∂ t V. For a sequence of functions {V i } i∈[1:k], we will denote V i (t, x) := V i,t and [1 : k] = {1, • • • , k}. Finally, the Kullback-Leibler (KL) divergence between two probability measures µ and ν is defined as D KL (µ|ν) = R d log dµ dν (x)dµ(x) when µ is absolutely continuous with respect to ν, and D KL (µ|ν) = +∞ otherwise.', '> ', '> A continuous-time Markov state trajectory X 0:T in latent space R d is given as a solution of the SDE:', '> (Prior State) dX t = b(t, X t )dt + dW t ,(1)', '> where X 0 ∼ µ 0 and {W t } t∈[0,T ] is an R d -valued Wiener process that is independent of µ 0 . Since X t is a Markov process, the time-evolution of the marginal distribution µ t is governed by a transition density, which is the solution to the Fokker-Planck equation associated with X t . This allows us to define a path measure P that represents the weak solutions of the SDE in (1) over an interval [0, T ]1 .', '> ', '> For a measurement model g i (y ti |X ti ), we consider the case that we have only access to the realization of the (latent) observation process at each discrete-time stamp {t i } i∈[1:k], i.e., y ti ∼ g i (y ti |X ti ), ∀i ∈ [1 : k]. In this paper, our goal is to infer the classes of SDEs which induce the filtering/smoothing path measure P ⋆ := P ⋆ (•|H t k ), the conditional distribution over the interval [0, T ] for a given P and a set of observations up to time t k , H t k = {y ti |i ≤ k}:', '21c34', '< where the normalizing constant Z(H t k ) = E P K i=1 g i (y ti |X ti ) serve as a observations likelihood. The path measure formulation of the posterior distribution described in (2) referred to as Feynman-Kac models. See (Del Moral, 2011;Chopin et al., 2020) for a more comprehensive understanding.', '---', '> where the normalizing constant Z(H t k ) = E P K i=1 g i (y ti |X ti ) serves as the observation likelihood. The path measure formulation of the posterior distribution described in (2) is referred to as Feynman-Kac models. See (Del Moral, 2011; Chopin et al., 2020) for a more comprehensive understanding.', '973d985', '< ']
