Abstract: We study the problem of clustering $T$ trajectories of length $H$, each generated by one of $K$ unknown ergodic Markov chains over a finite state space of size $S$.
We derive an instance-dependent, high-probability lower bound on the clustering error rate, governed by the stationary-weighted KL divergence between transition kernels.
We then propose a two-stage algorithm: Stage I applies spectral clustering via a new injective Euclidean embedding for ergodic Markov chains, a contribution of independent interest enabling sharp concentration results; Stage II refines clusters with a single likelihood-based reassignment step.
We prove that our algorithm achieves near-optimal clustering error with high probability under reasonable requirements on $T$ and $H$.
Preliminary experiments support our approach, and we conclude with discussions of its limitations and extensions.
Code Dataset Promise: Yes
Code Dataset Url: https://github.com/nick-jhlee/optimal-mixture-markov
Signed Copyright Form: pdf
Format Confirmation: I agree that I have read and followed the formatting instructions for the camera ready version.
Submission Number: 1428
Loading