% \documentclass{uai2025} % for initial submission
\documentclass[accepted]{uai2025} % after acceptance, for a revised version; 
% also before submission to see how the non-anonymous paper would look like 

% if you need to pass options to natbib, use, e.g.:
%     \PassOptionsToPackage{numbers, compress}{natbib}

\usepackage[american]{babel}
% \usepackage[british]{babel}

%% Some suggested packages, as needed:
\usepackage{natbib} % has a nice set of citation styles and commands
    \bibliographystyle{plainnat}
    \renewcommand{\bibsection}{\subsubsection*{References}}
\usepackage{mathtools} % amsmath with fixes and additions
% \usepackage{siunitx} % for proper typesetting of numbers and units
\usepackage{booktabs} % commands to create good-looking tables
\usepackage{tikz} % nice language for creating drawings and diagrams

\usepackage[utf8]{inputenc} % allow utf-8 input
\usepackage[T1]{fontenc}    % use 8-bit T1 fonts
\usepackage{hyperref}       % hyperlinks
\usepackage{url}            % simple URL typesetting
\usepackage{booktabs}       % professional-quality tables
\usepackage{amsfonts}       % blackboard math symbols
\usepackage{nicefrac}       % compact symbols for 1/2, etc.
\usepackage{microtype}      % microtypography
\usepackage{xcolor}         % colors
\usepackage{graphicx}
\usepackage{algorithm}
\usepackage{algorithmic}
\usepackage{amsmath}
\usepackage{enumitem}
% \usepackage{paralist}
\usepackage{caption}
\usepackage{xparse}
\usepackage{authblk}

\ExplSyntaxOn
\NewDocumentCommand{\longdash}{ O{2} }
 {
  --\prg_replicate:nn { #1 - 1 } { \negthinspace -- }
 }
\ExplSyntaxOff

\input{math_commands}
\newcommand{\hdpflow}{HDP-Flow}



\title{HDP-Flow: Generalizable Bayesian Nonparametric Model for Time Series State Discovery}

% Add authors
% \author[1]{\href{mailto:<jj@example.edu>?Subject=Your UAI 2025 paper}{Jane~J.~von~O'L\'opez}{}}
\author[1,2]{Sana Tonekaboni}
\author[2]{Tina Behrouzi}
\author[2]{Addison Weatherhead}
\author[3]{Emily B. Fox}
\author[4]{David Blei}
\author[2]{Anna Goldenberg}
% Add affiliations after the authors
\affil[1]{%
    Eric and Wendy Schmidt Center\\
    Broad Institute of MIT and Harvard\\
    Cambridge, MA, USA
}
\affil[2]{%
    Department of Computer science and Vector Institute of AI\\
    University of Toronto\\
    Toronto, ON, Canada
}
\affil[3]{%
    Department of Statistics and Computer Science\\
    Stanford University\\
    Stanford, CA, USA
}
\affil[4]{%
    Department of Statistics and Computer Science\\
    Columbia University\\
    New York, NY, USA
}

\begin{document}
\maketitle

\begin{abstract}
We introduce \hdpflow, a Bayesian nonparametric (BNP) model for unsupervised state discovery in dynamic, non-stationary time series data. Unlike prior work that assumes fixed states, \hdpflow\ models evolving datasets with unknown and variable latent states. By integrating the adaptability of BNP models with the expressive power of normalizing flows, \hdpflow\ effectively models dynamic, non-stationary patterns, while learning transferable states across datasets with well-calibrated uncertainty. We propose a scalable variational algorithm to enable efficient inference, addressing the limitations of traditional sampling-based BNP methods.
\hdpflow\ outperforms existing approaches in latent state identification and provides probabilistic insight into state distributions and transition dynamics.
Evaluating \hdpflow\ across two wearable datasets demonstrates the transferability of states across diverse sub-populations, validating its robustness and generalizability. 
We demonstrate that \hdpflow\ outperforms existing nonparametric models in latent state identification, particularly in the face of non-stationary states. In most cases, it even performs better than models that have prior information about the number of states. Additionally, we show that \hdpflow's variational inference algorithm successfully scales to long time series, where sampling-based inference fails, showcasing the model's practical utility for large-scale analyses. 
\end{abstract}

\setcounter{footnote}{0}
\renewcommand{\thefootnote}{\arabic{footnote}}


\section{Introduction}

Unsupervised modeling of latent states in time series can reveal the underlying processes that generate the data. 
For example, in healthcare, physiological metrics such as heart rate and respiratory rate can be used to infer the underlying health state of a patient, allowing the identification, prediction or tracking of various health conditions \citep{pantelopoulos2009survey, nazaret2023modeling}. Unsupervised representation learning methods have successfully encoded time series data to capture underlying states \citep{franceschi2019unsupervised, tonekaboni2021unsupervised, zhang2022self, yu2022latent, zhou2023deep}. However, these methods often require prior knowledge of the number of states and cannot adapt to evolving conditions. In real-world scenarios, the number and distribution of states can change over time. For instance, the emergence of a new disease would increase the representation of a previously unrepresented state. Models that can adapt to these changes and accommodate a potentially unbounded number of states are essential for many applications.
%Models capable of adapting to such changes and accommodating a potentially unbounded number of states are therefore essential in many applications. 
Bayesian nonparametric (BNP) models offer a solution to this problem \citep{orbanz2010bayesian, hjort2010bayesian, lorek2022flowhmm, orbanz2010bayesian}, but often rely on overly simplistic assumptions for real-world time series data. In particular, while allowing for an unbounded number of states, these models assume simple parametric state descriptions. 

In this paper, we introduce a BNP sequence model called \hdpflow. \hdpflow\ combines nonparametric modeling of state dynamics with the expressivity of deep generative modeling, all while ensuring computational efficiency. 
There are three main components to \hdpflow:
(1) To model state dynamics, \hdpflow\ builds on the hierarchical Dirichlet process hidden Markov model (HDP-HMM) \citep{teh2006hierarchical}. Specifically, the sticky HDP-HMM \citep{fox2011sticky}, a sequence model that enables the number of states to adapt to the observed data and learns realistic transitions by encouraging state persistence.
(2) To capture the intricate structure of real-world time series, \hdpflow\ integrates the sticky HDP-HMM with conditional normalizing flows \citep{papamakarios2017masked}, enabling modeling of complex state-specific emissions.
(3) To capture non-stationarity within states, \hdpflow\ introduces a time-conditioning mechanism that tracks state duration and conditions the observation distribution on the number of time steps within the state. This enables modeling trends, periodicity, and other forms of non-stationarity. Traditional HDP-HMMs, with their Markov assumption and static emission distributions, fail to capture such non-stationary states.

Most BNP models rely on sampling-based methods for inference, such as Markov chain Monte Carlo \citep{neal2000markov} and Gibbs sampling \citep{teh2004sharing}, which can become computationally intractable when analyzing long time series across large cohorts. To address this limitation, we employ an efficient stochastic variational inference (SVI) algorithm based on black-box variational inference (BBVI) \citep{ranganath2014black}. This approach enables effective handling of the complex distributions and dependencies inherent in the generative process of \hdpflow, making it scalable for large-scale applications.

We evaluate \hdpflow\ on both real and simulated datasets, comparing the learned states to those of other nonparametric and parametric models. 
\hdpflow\ consistently outperforms nonparametric models in identifying latent states, demonstrating exceptional accuracy in settings with non-stationary emissions. It also bests other models in approximating the true data distribution within each state. 
Additionally, we test the generalizability of \hdpflow\ across two cohorts, demonstrating its ability to adapt to new datasets and provide insights into physiological changes in humans. Finally, when applied to long time series data of human activities, we showcase the superior scalability of \hdpflow's SVI algorithm compared to sampling-based inference methods.

\input{related_work}
\input{method_w_wearables} %add the dynamic kappa param updates during training
\input{eval_w_wearables}
\input{bump}
\vspace{-6pt}
\section{Conclusion}
\vspace{-4pt}

We present \hdpflow, a Bayesian nonparametric model for unsupervised latent state modeling in time series. By unifying the adaptability of Bayesian nonparametrics with the expressive power of conditional normalizing flows, \hdpflow\ captures non-stationary and evolving states in uncontrolled environments with minimal prior knowledge all while maintaining an efficient variational inference for modeling complex real-world time series dynamics. Our results demonstrate superior performance in learning latent states and highlight the transferability of the states across sub-populations. However, this flexibility also presents a common challenge in Bayesian nonparametrics: determining the optimal state granularity for structured tasks. Careful tuning of priors is crucial to balance model growth and avoid unnecessary complexity. Although \hdpflow\ is computationally more intensive than standard deterministic neural networks, its Bayesian framework provides a structured representation of latent states, uncertainty estimates, and a generative understanding of observations; making it a powerful tool for inference and modeling in evolving time series.
%This flexibility also introduces a common challenge in Bayesian nonparametrics: determining the right granularity of states for structured machine learning tasks. Careful tuning of priors is necessary to balance model growth and prevent unnecessary complexity. 
%While training \hdpflow\ is more computationally intensive than standard deterministic neural networks, its Bayesian framework provides a structured view of latent states, uncertainty estimates, and a generative understanding of observations; making it a powerful tool for both inference and modeling evolving time series. 
% Future work will extend \hdpflow to broader wearable data, incorporate human-in-the-loop learning, and optimize computational efficiency. 


%We have proposed \hdpflow\, a Bayesian nonparametric model for unsupervised representation learning in time series. By combining the flexibility of Bayesian nonparametrics and the expressivity of conditional normalizing flows, we effectively models non-stationary and evolving states while maintaining an efficient variational inference framework for learning complex real-world time series dynamics. \hdpflow\ provides a tool for discovering latent states in uncontrolled environments with minimal prior knowledge, making it ideal for exploratory analysis. 

%However, this flexibility introduces a common challenge in Bayesian nonparametrics: determining the right granularity of states for structured machine learning tasks. Careful tuning of priors is necessary to balance model growth and prevent unnecessary complexity. While training \hdpflow\ is more computationally intensive than standard unsupervised neural networks, its Bayesian framework provides a structured view of latent states, uncertainty estimates, and a generative understanding of observations—making it a powerful tool for both inference and modeling evolving time series.


% This work presents an inference network for clinicians to visualize changes in a person's states. We demonstrated that, when trained on a specific cohort, the network provides informative insights for comparisons between cohorts. Limitation: The approach relies on a human-in-the-loop process, which represents a future direction to incorporate patient feedback. This feedback could help identify the key drivers of state changes and distinguish between different state transitions.

\begin{acknowledgements} % will be removed in pdf for initial submission,
						 % (without ‘accepted’ option in \documentclass)
                         % so you can already fill it to test with the
                         % ‘accepted’ class option
Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute. This work was supported in part by ONR Grant N00014-22-1-2110, NSF Grant 2205084, and the Stanford Institute for Human-Centered Artificial Intelligence (HAI). EBF is a Chan Zuckerberg Biohub – San Francisco Investigator.
\end{acknowledgements}

\bibliography{ref}

\newpage

\onecolumn

\title{{HDP-Flow: Generalizable Bayesian Nonparametric Model for Time Series State Discovery}\\(Supplementary Material)}
\maketitle

\vspace{0.7cm}
% \appendix
\input{appendix_w_wearables}

\end{document}