Probabilistic Learning and Generation in Deep Sequence Models

Wenlong Chen

Published: 01 Jan 2026, Last Modified: 01 Mar 2026OpenReview Archive Direct UploadEveryoneCC BY-NC-SA 4.0

Abstract: Deep sequence models have achieved profound success across a wide range of data modalities. Despite exceptional predictive performance, the main concern of their deployment centers around the lack of uncertainty awareness. In contrast, probabilistic models quantify the uncertainty associated with unobserved variables with rules of probability. Notably, Bayesian methods leverage Bayes’ rule to express our belief of unobserved variables given some observed variables in a principled way. Since exact Bayesian inference is computationally infeasible at scale, approximate inference is required in practice. Two major bottlenecks of Bayesian methods, especially when applied in deep neural networks, are prior specification and approximation quality. In Chapter 3 and 4, we investigate how the architectures of deep sequence models themselves can be informative for specifying priors or choosing approximation methods in probabilistic models. We first develop an approximate Bayesian inference method tailored to the Transformer architecture based on the similarity between attention mechanism and sparse Gaussian process. Next, we exploit the long-range memory preservation capability of HiPPOs (High-order Polynomial Projection Operators) to construct an interdomain inducing point for Gaussian process, which successfully memorizes the history in online or continual learning. In addition to the progress of deep sequence models in predictive tasks, sequential generative models consisting of a sequence of latent variables (e.g., diffusion models), are popularized in the domain of deep generative models. Inspired by the explicit self-supervised signals for these latent variables in diffusion models, in Chapter 5, we explore the possibility of improving other deep generative models with self-supervised signals for their latent states, and investigate desired probabilistic structures over the sequence of latent states in sequential generation. Overall, this thesis leverages inductive biases in deep sequence models to design probabilistic inference or structure, which bridges the gap between deep sequence models and probabilistic models, leading to mutually reinforced improvement.