Learning with little mixing

Ingvar Ziemann; Stephen Tu

Learning with little mixing

Ingvar Ziemann, Stephen Tu

Published: 31 Oct 2022, Last Modified: 06 Apr 2025NeurIPS 2022 AcceptReaders: Everyone

Keywords: Learning Theory, Nonlinear Dynamical Systems, Learning with dependent data

TL;DR: We show that for mixing systems under an easiness condition, the rate of convergence of the LSE for rather general hypothesis classes has iid data like performance

Abstract: We study square loss in a realizable time-series framework with martingale difference noise. Our main result is a fast rate excess risk bound which shows that whenever a trajectory hypercontractivity condition holds, the risk of the least-squares estimator on dependent data matches the iid rate order-wise after a burn-in time. In comparison, many existing results in learning from dependent data have rates where the effective sample size is deflated by a factor of the mixing-time of the underlying process, even after the burn-in time. Furthermore, our results allow the covariate process to exhibit long range correlations which are substantially weaker than geometric ergodicity. We call this phenomenon learning with little mixing, and present several examples for when it occurs: bounded function classes for which the $L^2$ and $L^{2+\epsilon}$ norms are equivalent, finite state irreducible and aperiodic Markov chains, various parametric models, and a broad family of infinite dimensional $\ell^2(\mathbb{N})$ ellipsoids. By instantiating our main result to system identification of nonlinear dynamics with generalized linear model transitions, we obtain a nearly minimax optimal excess risk bound after only a polynomial burn-in time.

Supplementary Material: pdf

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/learning-with-little-mixing/code)

12 Replies

Loading