
\section{Clinical Case-Study}


Wearable devices capture rich, longitudinal health data in real-world settings, offering insights into patient well-being beyond the clinic. However, high variability, sensor noise, and scarce annotations make interpretation challenging, and many physiological patterns and signal variations remain poorly understood. The ability of \hdpflow\ to identify latent states in time series data without much prior information on the distribution or number of states makes it particularly valuable for wearable healthcare applications. It provides a principled way to discover and adaptively refine latent states. Here, we show that \hdpflow\ extracts meaningful states from wearable data and that these states generalize across datasets, enabling the creation of a growing repository for interpretable health monitoring across studies. 
% Here, we show \hdpflow's functionality in such settings. 

% Wearable devices capture rich, longitudinal health data in real-world settings, offering insights into patient well-being beyond the clinic. However, high variability, sensor noise, and scarce annotations make interpretation challenging, and many physiological patterns and signal variations remain poorly understood. \hdpflow\ excels in identifying latent states in time series without prior knowledge of their distribution or number. \hdpflow\ provides a principled way to discover and adaptively refine latent states, making it ideal for wearable healthcare applications. It provides a principled approach to discovering and refining states adaptively. Here, we demonstrate that \hdpflow\ extracts meaningful states from wearable data and that these states generalize across datasets, enabling the creation of a growing repository for interpretable health monitoring across studies.
 


% \paragraph{Cross-Cohort Generalizability Analysis:}
% The evaluation is conducted using the Bump dataset and the Stress and Crohn’s dataset to analyze the generalizability and physiological state changes across cohorts.

%\vspace{-2mm}
\fontsize{11}{9.5}\selectfont\textbf{Datasets: }\normalsize
The Stress in Crohn’s dataset tracked $112$ patients using Oura ring data to assess stress monitoring for symptom prediction, alongside surveys on flare-ups, medical history, and treatments\footnote{https://clinicaltrials.gov/study/NCT04809194}. Similarly, the BUMP study \citep{goodday2022better} monitored $431$ pregnant participants, of which we use 
$256$ (see Appendix \ref{subsubsection:bump_crohn_data} for inclusion criteria), capturing physiological and psychological changes. 
% For this work, we include only Oura ring data and survey responses from BUMP.
% The Stress in Crohn’s dataset tracked $112$ Crohn’s disease patients, collecting physiological data via the Oura ring to assess stress monitoring for symptom prediction. It includes comprehensive surveys on flare-ups, medical history, hospital visits, and surgeries\footnote{https://clinicaltrials.gov/study/NCT04809194}. Similarly, the Better Understanding of Metamorphosis of Pregnancy (BUMP) study \citep{goodday2022better} monitored $431$ participants, $256$ of which we use in this work (see inclusion criteria in the Appendix), using wearable devices to explore physiological and psychological changes during pregnancy. For the purposes of this work, we have only included the Oura ring physiological measurements as well as survey data from the BUMP study.



% \vspace{-2mm}
% \paragraph{Challenges: }Wearable data exhibits high variability due to its collection in uncontrolled environments, offering the advantage of longitudinal tracking but at the cost of increased noise and lack of standardized recording conditions. Additionally, many wearable signal patterns and variations remain poorly understood. With unknown states, supervised models are not feasible, necessitating an exploratory and probabilistic framework.

% In this study, we assess \hdpflow's generalization performance by training on one dataset and testing on another. This approach is crucial for future human-in-the-loop evaluations, where limited training data necessitates real-time identification of wearable signal states and optimal intervention timing. The model’s interpretable probabilistic structure enables uncertainty estimation, detecting previously unseen states or excessive noise, and prompting participant notifications for improved physiological monitoring.

% \textbf{\hdpflow} provides a key advantage by dynamically identifying different states within multidimensional signals without prior knowledge of the number of states. In this study, we assess \hdpflow\ \'s generalization performance by training on one dataset and testing on another. This approach is crucial for future human-in-the-loop evaluations, where limited training data necessitates real-time identification of wearable signal states and optimal intervention timing. The model’s interpretable probabilistic structure enables uncertainty estimation, detecting previously unseen states or excessive noise, and prompting participant notifications for improved physiological monitoring.

% \begin{itemize}
%     \item Describe the dataset
%     \item Describe the challenges with modeling such data
%     \item Describe how HDPFlow helps and tie it to the human-in-the-loop future ideas.
% \end{itemize}
%\vspace{0.5mm}
\fontsize{12}{7.5}\selectfont\textbf{Experiments:} \normalsize
%\vspace{2mm}
Unlike wearable datasets for HAR, which are collected in controlled environments with well-defined states, wearable data for these studies capture complex, uncontrolled dynamics with many underlying factors and no clear state definitions. Here, we demonstrate an exploratory analysis of learned latent states in this real-world dataset. 


\begin{figure}[h]
    \centering
    \includegraphics[width=0.49\textwidth]{figures/figures_bump/Paired_data2.jpg}
    \caption{State-wise Distribution of Paired Bump and Crohn’s Data. a) Heatmap showing the ratio of paired individuals with similar beta distributions for Sleep Related Impairment, Pain Interference, and Feeling in Control across states. b) The distribution of predicted states of Crohn’s data.}
    %Heatmap showing the percentage of paired individuals where the beta distributions of Sleep Related Impairment, Pain Interference, and Feeling in Control are significantly similar within each state. Higher values indicate a more consistent distribution between Bump and Crohn’s data.}
    \label{fig:paired_features}
\end{figure}


% \begin{table*}
%     \centering
%     \begin{tabular}{lcccccc}
%         &\multicolumn{4}{c}{Crohns}&\multicolumn{2}{c}{Bump}\\
%         \toprule
%          & (Sleep) & (Stress) & (Flareup) & NLL & (Sleep) & (Stress)\\
%          \midrule
%          RNN (baseline) & & & & N/A\\
%          %direct state & & & &\\
%         % wo sleep &0.64 &0.59 & 0.84$\pm$0.15&972.1&0.78&0.65\\
%          with uncertainty & &  &\textbf{0.75$\pm$0.12}&\\%$\pm$478.1
%          \hdpflow & 0.59& 0.45 &0.77$\pm$0.14&6697.2&0.66&0.48\\ %$\pm$3204.5
%          %Bump 0.750$\pm$0.145 & 6289.3047$\pm$2100.3162
%          %Bump(trained) & & & &0.73$\pm$0.13&6842.2$\pm$2286.4
%     \end{tabular}
%     \caption{States assignments of subjective labels trained on Crohn’s data and tested on Bump data (measured through Hamming distance),analyzed separately without sleep features and without accounting for uncertainty, compared to the original \hdpflow.}
%     \label{tab:bump_classification}
% \end{table*}





%Table \ref{tab:bump_classification} shows the hamming disaccuracy of predictions trained on Bump and tested on Crohn’s, comparing results with and without uncertainty metrics calculated as explained in the section \ref{subsection: Uncertainty}. First, we concatenate the probability of belonging to each state with two demographic factors, age and BMI, collected at the beginning of the study.

% For HDP-Flow trained without sleep features, state stickiness is significantly lower, leading to the detection of 20 different states.
We first train \hdpflow\ on the Crohn’s disease population and analyze the distribution of states it identifies (Figure \ref{fig:paired_features}.b). To interpret these states and assess whether they capture similar concepts across populations, we leverage subjective measures from survey data.
To quantify the consistency of state distributions across Crohn’s and BUMP populations, we perform a Beta-distribution Bayes Factor analysis (detailed in Supplementary Section \ref{subsection: bayes factor}). Figure \ref{fig:paired_features}.a illustrates the proportion of paired individuals between the two datasets exhibiting similar beta distributions, where higher values indicate greater cross-population alignment. This analysis also helps characterize state-specific patterns; state 5 predominantly captures pain, while state 7 aligns with feelings of control, a key indicator of stress.
% Due to the minimum data requirement for robust distribution difference calculations, states 6 and 7 for Pain Interference, which have lower frequencies, are empty. 
% the Sleep Impairment distribution is predominantly observed in blue, brown, and yellow states (States 0, 1, and 3). The distribution similarity plot further confirms the consistency of this feature across both datasets.
% The plot examines whether locations with similar subjective feature distributions also exhibit similar objective feature state transitions. 

This is further evident in Figure \ref{fig:state_features}, where the dominant states for each individual (across both cohorts) correspond to patterns in sleep impairment subjective measures. This suggests that the learned states capture structured relationships between subjective assessments and that this structure transfers effectively to a different population with a distinct distribution of observations. As shown in the bottom panel, wearable data can be highly noisy, making it challenging to extract meaningful signals. In Appendix \ref{app_subsection: bump_crohns_analysis}, we further show the correlation between input features and probabilistic states, highlighting the consistency of wearable signals. These findings highlight the potential of \hdpflow\ in uncovering latent patterns in complex, real-world scenarios.

% Based on log-likelihood uncertainty, we identified 4 patients in the Crohn's dataset and 3 participants in the Bump dataset with high aleatoric uncertainty. These individuals were excluded from the analysis.

% Both dropout sensitivity (\(\approx 0.003\)) and noise sensitivity (\(\approx 0.004\)) are relatively small, indicating that \hdpflow is fairly robust to input perturbations. As expected, the perturbation sensitivity for the Bump data, when the model is trained on Crohn's data, is higher, with dropout sensitivity (\(\approx 0.005\)) and noise sensitivity (\(\approx 0.007\)). Without sleep features, both sensitivities increased to 0.01. 

%\paragraph{Implementation detail: } Softmax instead of Softplus for beta
