['1c1', '< Title: GENERATIVE MODELING WITH PHASE STOCHASTIC BRIDGES', '---', '> Title: PHASE STOCHASTIC BRIDGES: ACCELERATED GENERATIVE MODELING VIA OPTIMAL CONTROL IN PHASE SPACE', '3c3', "< Abstract: We introduce a novel generative modeling framework grounded in phase space dynamics, taking inspiration from the principles underlying Critically damped Langevin Dynamics and Bridge Matching. Leveraging insights from Stochastic Optimal Control, we construct a more favorable path measure in the phase space that is highly advantageous for efficient sampling. A distinctive feature of our approach is the early-stage data prediction capability within the context of propagating generative Ordinary Differential Equations or Stochastic Differential Equations. This early prediction, enabled by the model's unique structural characteristics, sets the stage for more efficient data generation, leveraging additional velocity information along the trajectory. This innovation has spurred the exploration of a novel avenue for mitigating sampling complexity by quickly converging to realistic data samples. Our model yields comparable results in image generation and notably outperforms baseline methods, particularly when faced with a limited Number of Function Evaluations. Furthermore, our approach rivals the performance of diffusion models equipped with efficient sampling techniques, underscoring its potential in the realm of generative modeling.", '---', "> Abstract: Generative modeling has seen significant advances, yet efficient sampling, especially with limited computational budgets, remains a critical challenge. This paper introduces Phase Stochastic Bridges (PSB), a novel generative modeling framework that addresses this by operating in phase space, drawing inspiration from Critically Damped Langevin Dynamics (CLD) and Bridge Matching (BM). Leveraging Stochastic Optimal Control (SOC) theory, PSB constructs a more favorable, straighter path measure in phase space, which is highly advantageous for efficient data generation. A distinctive feature of PSB is its early-stage data prediction capability within the context of propagating generative Ordinary Differential Equations (ODEs) or Stochastic Differential Equations (SDEs). This early prediction, enabled by the model's unique structural characteristics, facilitates more efficient data generation by effectively leveraging additional velocity information along the trajectory. Our approach demonstrates comparable results in high-fidelity image generation and notably outperforms baseline methods, particularly when faced with a limited Number of Function Evaluations (NFEs). Furthermore, PSB rivals the performance of diffusion models equipped with efficient sampling techniques, underscoring its significant potential in the realm of accelerated generative modeling.", '6,10c6', '< Diffusion Models (DMs; Song et al. (2020a); Ho et al. (2020)) constitute an instrumental technique in generative modeling, which formulate a particular Stochastic Differential Equation (SDE) linking the data distribution with a tractable prior distribution. Initially, a DM diffuses data towards the prior distribution via a predetermined linear SDE. In order to reverse the process, a neural network is used to approximate the score function which is analytically available. Subsequently, the approximated score is utilized to conduct time reversal (Anderson, 1982;Haussmann & Pardoux, 1986) of this diffusion process, ultimately generating data. Recently, the Critical-damped Langevin Dynamics (CLD; Dockhorn et al. (2021)) extends the SDE framework of DM into phase space (whereas DMs operate in the position space) by introducing an auxiliary velocity variable, which is defined by tractable Gaussian distributions at the initial and terminal time steps. This augmentation induces a trajectory in position space exhibiting enhanced smoothness, as stochasticity is solely introduced into the velocity space. The distinctive structure of CLD is shown to enhance the empirical performance and sample efficiency. However, despite the success of CLD, inefficient sampling still persists due to unnecessary curvature of the dynamics (Fig. 1) as it has to converge to equilibrium for sampling from the tractable prior.', '< The remarkable accomplishments of DM have also catalyzed recent advancements in generative modeling, leading to the development of Bridge Matching (BM; (Peluchetti, 2021;Liu et al., 2022;2023)) and Flow Matching (FM;models (Lipman et al., 2022)). These models leverage dynamic transport maps underpinned by the utilization of SDEs or ODEs. Unlike DM, Bridge and Flow Matching relaxes the reliance on a forward diffusion process with an asymptotic convergence to a prior distribution over an infinite time horizon. Moreover, they exhibit a heightened degree of versatility, enabling the construction of transport maps between two arbitrary distributions by drawing Figure 1: The pixel-wise trajectories comparison with CLD (Dockhorn et al., 2021). Left figures correspond to the trajectories over time w.r.t random sampled 16 pixels, for position and velocity. Our model is able to learn straighter trajectories which is beneficial for reducing sampling complexity.', '< upon insights from domains such as optimal transport (Pooladian et al., 2023), normalizing flow (Tong et al., 2023b), and optimal control (Liu et al., 2023).', '< In this paper, we focus on enhancing the sample efficiency of velocity based generative modeling (eg, CLD) by utilizing the Stochastic Optimal Control (SOC) theory. Specifically, we leverage the outcomes of stochastic bridge within the context of linear momentum systems (Chen & Georgiou, 2015) to construct a path measure bridging the data and prior distribution. The resulting path exhibits a more straight position and velocity trajectory compared to CLD (fig. 1), making it more amenable to efficient sampling. Within the broader landscape of dynamic generative modeling (ie, ODE/SDE based generative models), data point can often be represented as linear combinations of scaled intermediate data of dynamics and Gaussian noise. In our work, we re-establish this property, enabling the estimation of target data points by leveraging both state and velocity information. In the case of DM and FM, the estimation of target data is exclusively reliant on position information, whereas our method incorporates the additional dimension of velocity data, enhancing the precision and comprehensiveness of our estimations. It is also worth noting that our model exhibits the capacity to generate high fidelity images at early time steps (fig. 2). In addition, we propose a sampling technique which demonstrates competitive results with small Number of Function Evaluations (NFEs), eg, 5 to 10. Table .1 demonstrates the design differences among aforementioned models. In summary, our paper presents the following contributions:', '< 1. We propose Acceleration Generative Modeling (AGM) which is built on the SOC theory, enabling the favorable trajectories for efficient sampling over 2nd-order momentum dynamics generative modeling such as CLD. 2. As a result of AGM structural characteristics, it becomes possible to estimate a realistic data point at an early time point, a concept we refer to as sampling-hop. This approach not only yields a significant reduction in sampling complexity but also offers a novel perspective on accelerating the sampling in generative modeling by leveraging additional information from the dynamics. 3. We achieve competitive results compared to DM approaches equipped with specifically designed fast sampling techniques on image datasets, particularly in small NFE settings.', '---', '> Generative modeling, particularly with Diffusion Models (DMs; Song et al. (2020a); Ho et al. (2020)), has achieved remarkable success in synthesizing high-fidelity data. DMs operate by formulating a Stochastic Differential Equation (SDE) to gradually diffuse data towards a tractable prior, and then reversing this process using a neural network to approximate the score function for data generation (Anderson, 1982;Haussmann & Pardoux, 1986). While powerful, DMs primarily operate in position space. Critically-damped Langevin Dynamics (CLD; Dockhorn et al. (2021)) extends this framework into phase space by introducing an auxiliary velocity variable, which is defined by tractable Gaussian distributions at the initial and terminal time steps. This augmentation leads to smoother trajectories and enhanced empirical performance and sample efficiency. However, despite these advancements, CLD still suffers from persistent sampling inefficiency due to unnecessary curvature in its dynamics (Fig. 1), as it must converge to equilibrium for sampling from the tractable prior.', '11a8,14', '> The success of DMs has also spurred advancements in alternative generative modeling paradigms, such as Bridge Matching (BM; (Peluchetti, 2021;Liu et al., 2022;2023)) and Flow Matching (FM; Lipman et al. (2022)). These models utilize dynamic transport maps underpinned by SDEs or Ordinary Differential Equations (ODEs) to construct direct bridges between two arbitrary distributions, relaxing the reliance on an asymptotic forward diffusion process. This versatility allows them to draw insights from optimal transport (Pooladian et al., 2023), normalizing flows (Tong et al., 2023b), and optimal control (Liu et al., 2023).', '> ', '> In this paper, we aim to significantly enhance the sample efficiency of velocity-based generative modeling, like CLD, by leveraging Stochastic Optimal Control (SOC) theory. Specifically, we utilize the principles of stochastic bridges within linear momentum systems (Chen & Georgiou, 2015) to construct a more favorable path measure that directly connects the data and prior distributions. This approach yields substantially straighter position and velocity trajectories compared to CLD (Fig. 1), making the dynamics more amenable to efficient sampling. Unlike DM and FM, which rely exclusively on position information for target data estimation, our method re-establishes the property that data points can be represented as linear combinations of scaled intermediate dynamics and Gaussian noise, incorporating both state and velocity information to enhance estimation precision. This allows our model to generate high-fidelity images at remarkably early time steps (Fig. 2) and enables a novel sampling technique that achieves competitive results with a small Number of Function Evaluations (NFEs), e.g., 5 to 10. Table 1 outlines the key design differences among these generative models. In summary, our paper makes the following significant contributions:', '> 1.  We propose Acceleration Generative Modeling (AGM), a novel framework built on SOC theory, which constructs favorable, straighter trajectories for efficient sampling within 2nd-order momentum dynamics, outperforming models like CLD.', '> 2.  A key structural characteristic of AGM is its ability to estimate realistic data points at an early time, a concept we term "sampling-hop." This innovation not only drastically reduces sampling complexity but also offers a fresh perspective on accelerating generative model sampling by effectively leveraging additional velocity information from the dynamics.', '> 3.  We demonstrate competitive results against state-of-the-art DM approaches equipped with specialized fast sampling techniques on image datasets, particularly excelling in low-NFE settings.', '> ', '17,20c20', '< Table 1: Comparison between models in terms of boundary distributions p 0 and p 1 . Our AGM generalizes DM beyond Gaussian priors to phase space, similar to CLD. However, unlike CLD, AGM does not need to converge to the Gaussian at equilibrium which causes curved trajectory(see Fig. 1), instead, velocity distribution will be the convolution of data distribution with Gaussian.', '< Models DM/FM CLD AGM(ours)', '< p 0 (•) p data (x) p data (x) × N (0, I d ) N (0, Σ 0 × I 2d ) p 1 (•) N (0, I d ) N (0, I d ) × N (0, I d ) p data (x) × p data (x) * N (0, Σ 1 ⊗ I 2d )', '< Diffusion Model: In the framework of DM, given x 0 drawn from a data distribution p data , the model proceeds to construct a SDE,', '---', '> Diffusion Model: In the framework of DM, given x 0 drawn from a data distribution p data , the model proceeds to construct an SDE,', '33,35c33,35', '< We apply SOC to characterize the twisted trajectory of momentum dynamics induced by CLD (Dockhorn et al., 2021). It becomes evident that the mechanisms encompassing flow matching, diffusion modeling, and Bridge matching collectively facilitate the construction of an estimated target data point, denoted as x 1 , by utilizing the intermediate state of the dynamics, x t . Our additional objective is to expedite the estimation of a plausible x 1 by incorporating additional dynamics-related information, such as velocity, thereby curtailing the requisite time integration.', '< In this section, we introduce the proposed method, termed as the Acceleration Generative Model (AGM), rooted in SOC theory. Building upon (Chen & Georgiou, 2015), we extend the framework by incorporating a time-varying diffusion coefficient and accommodating arbitrary boundary conditions, ultimately arriving at an analytical solution suited for the generative modeling. We demonstrate its efficacy in rectifying the trajectory of CLD, concurrently showcasing its aptitude for accurately estimating the target data at an early timestep t i , thereby enabling expeditious sampling.', '< As suggested by BM approach, there is a necessity to formulate a trajectory that bridges the two data points sampled from p 0 and p 1 respectively. Desirably, the intermediate trajectory should exhibit optimal characteristics that facilitate smoothness and linearity. This is essential for the ease of simulating the dynamics system to obtain the solution. In our endeavor to tackle this challenge and enhance the estimation of the data point x 1 by incorporating velocity components, we encapsulate the problem within a SOC framework, specifically formulated in the phase space which reads: Definition 2 (Stochastic Bridge problem of linear momentum system (Chen & Georgiou, 2015)).', '---', "> We apply Stochastic Optimal Control (SOC) to address the suboptimal, 'twisted' trajectories observed in momentum dynamics induced by methods like CLD (Dockhorn et al., 2021). While existing generative models such as Flow Matching, Diffusion Models, and Bridge Matching primarily estimate the target data point, x₁, using only the intermediate state of the dynamics, xₜ, our objective is to significantly expedite this estimation. We achieve this by incorporating additional dynamics-related information, specifically velocity, thereby curtailing the requisite time integration for generating high-fidelity samples.", "> In this section, we formally introduce our proposed method, the Acceleration Generative Model (AGM), which is deeply rooted in SOC theory. Extending the foundational work of Chen & Georgiou (2015), we generalize the framework by incorporating a time-varying diffusion coefficient and accommodating more flexible boundary conditions. This extension culminates in a novel analytical solution specifically tailored for efficient generative modeling. We rigorously demonstrate AGM's efficacy in rectifying the inherently curved trajectories of CLD, while concurrently highlighting its unique aptitude for accurately estimating the target data at significantly early timesteps (tᵢ), thereby enabling substantially more expeditious sampling.", '> Drawing inspiration from the Bridge Matching (BM) approach, it is crucial to formulate a trajectory that effectively bridges two distributions, p₀ and p₁. Ideally, this intermediate trajectory should possess optimal characteristics, particularly smoothness and linearity, to facilitate straightforward and efficient simulation of the dynamical system. To address this and further enhance the estimation of the target data point x₁ by explicitly incorporating velocity components, we formalize the problem within a Stochastic Optimal Control (SOC) framework. This framework is specifically formulated in phase space as follows: Definition 2 (Stochastic Bridge problem of linear momentum system (Chen & Georgiou, 2015)).', '45,46c45,46', '< By plugging the optimal control (6) back to the dynamics (5), we can obtain the desired SDE.', '< As been suggested by (Song et al., 2020b;Dockhorn et al., 2021), such SDE has a corresponding probablistic ODE which shares the same marginal over time in which the drift term will have an additional score term ∇ v log p(m t , t). Here we summarize the force term for SDE and ODE as:', '---', '> By substituting the optimal control (6) back into the dynamics (5), we obtain the governing SDE.', '> As suggested by (Song et al., 2020b;Dockhorn et al., 2021), this SDE possesses a corresponding probabilistic ODE that shares the same marginal over time, where the drift term includes an additional score term ∇ᵥ log p(mₜ, t). We summarize the force terms for both the SDE and ODE formulations as follows:', '48c48', '< Henceforth, we refer to the dynamics associated with the Bridge Matching SDE as AGM-SDE, and its corresponding ODE counterpart as AGM-ODE. Meanwhile, the linearity of the system implies the intermediate state m t and the close form solution of score term are analytically available. In particular, the mean µ t and covariance matrix Σ t of the intermediate marginal', '---', '> Henceforth, we refer to the dynamics associated with the Bridge Matching SDE as AGM-SDE, and its corresponding ODE counterpart as AGM-ODE. Given the linearity of the system, both the intermediate state mₜ and the closed-form solution of the score term are analytically available. Specifically, the mean µₜ and covariance matrix Σₜ of the intermediate marginal distribution', '50c50', '< of such a system can be analytically computed with', '---', '> of such a system can be analytically computed as', '52c52', '< , provided we have the boundary conditions µ 0 and Σ 0 in place, as outlined in Särkkä & Solin (2019). Please see Appendix.D.3 for detail. In order to sample from such multi-variant Gaussian, one need to decompose the covariance matrix by Cholesky decomposition, and m t is reparamertized as:', '---', '> , provided we define the boundary conditions µ₀ and Σ₀, as outlined in Särkkä & Solin (2019). For further details, please refer to Appendix.D.3. To sample from such a multivariate Gaussian, we perform a Cholesky decomposition of the covariance matrix, and mₜ is reparameterized as:', '56c56', '< Parameterization: The Force term can be represented as a composite of the data point and Gaussian noise. Specifically,', '---', '> Parameterization: The force term can be linearly represented as a combination of the data point and Gaussian noise. Specifically, the optimal acceleration a*(mₜ, t) is given by:', '58,60c58,60', '< We express the force term as', '< F θ t = s θ t • z t .', '< Here, z t assumes the role of regulating the output of the network s θ t , ensuring that the variance of the network output is normalized to unity. For the detailed formulation of the normalizer z t , please refer to Appendix.D.8. In a manner similar to the BM approach, one can formulate the objective function for regressing the force term as follows:', '---', "> We parameterize the neural network's output for the force term as", '> F θ t = s θ t • z t .', '> Here, zₜ acts as a normalization factor, scaling the output of the network sᵪ(mₜ, t; θ) to ensure that the variance of the network output is normalized to unity. For the detailed formulation of the normalizer zₜ, please refer to Appendix.D.8. Following a similar approach to Bridge Matching (BM), the objective function for regressing the force term is formulated as:', '62c62', '< Where λ(t) is known as the reweight of the objective function across the time horizon. We defer the derivation of ℓ t and the presentation of L t , λ(t) and a t in Appendix.D.', '---', '> where λ(t) is a reweighting function for the objective across the time horizon. We defer the derivation of ℓₜ and the full presentation of Lₜ, λ(t), and aₜ to Appendix.D.', '446a447,451', '> ', '> Figure tab_1: ', '> Type: table', "> Caption: Table 1: Comparison of generative models based on their initial (p₀) and terminal (p₁) boundary distributions. Our AGM, operating in phase space, generalizes beyond standard Diffusion Models (DM) by not requiring convergence to a simple Gaussian prior at equilibrium, which often leads to curved trajectories in methods like CLD (see Fig. 1). Instead, AGM's terminal velocity distribution is designed as a convolution of the data distribution with a Gaussian, facilitating straighter and more efficient paths.", '> Data: Models DM/FM CLD AGM(ours)\\np 0 (•) p data (x) p data (x) × N (0, I d ) N (0, Σ 0 × I 2d )\\np 1 (•) N (0, I d ) N (0, I d ) × N (0, I d ) p data (x) × p data (x) * N (0, Σ 1 ⊗ I 2d )', '626d630', '< ']
