Abstract: Discretization is widely used in time-series analysis to convert continuous observations into
symbolic sequences before sequence modeling. Its effect on forecasting, however, is not
merely representational: discretization may preserve the predictive structure of the original
process, or it may destroy it by merging histories that imply different future distributions.
In this paper, we study discretization
through the lens of predictive states and Hankel-rank-based predictive complexity.
We first formalize predictive-sufficient discretization and review how predictive-state
collapse under coarsening reduces predictive complexity. We then introduce synthetic same-\(K\)
hidden Markov model families that share the same hidden-state cardinality but exhibit different
Bayes-level context gaps. These families allow us to separate nominal hidden-state size from
observable predictive difficulty in a controlled setting. Our experiments show that hidden-state
count alone does not determine forecasting difficulty, even when the latent-state cardinality is
fixed. Moreover, learner-side recovery of Bayes-level context sensitivity is family-dependent and
non-monotone in hidden dimension: some families benefit from moderate increases in representation
size, whereas others degrade when the model dimension becomes unnecessarily large. Taken together,
these results suggest that, in the present same-\(K\) setting, representation requirements are not
explained by hidden-state count alone, but also depend on the family-specific predictive structure
that remains observable after discretization.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Daniel_Durstewitz1
Submission Number: 8683
Loading