Sequence Learning from Continuous Streams of Data

Sequence Learning from Continuous Streams of Data

ICLR 2026 Conference Submission22149 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: sequence learning, recurrent learning, streaming learning, language models

Abstract: Sequence data are inherently dependent, yet sequence learners (e.g., language models) are often trained as if samples were independent and identically distributed (IID) by segmenting long streams into short, shuffled chunks, breaking natural continuity and undermining long-range credit assignment. We formalize multi-stream sequence learning, a continuity-preserving training framework that presents multiple streams in their natural order, a setting that has been conflated with solution methods and remains underexplored. To support this paradigm, we propose Memora, a recurrent-only architecture with persistent hidden states, making it more suitable for sequence learning than architectures trained with IID chunking. Memora is built around our Gated Linear Recurrent Unit (GLRU), a lightweight unit designed for efficient parallel training and robust temporal reasoning. It achieves effective learning on long byte-level sequences and remains reliable even in the strict streaming setting, where data arrive online one byte at a time. Our experiments highlight that continuity-preserving training outperforms IID chunking, underscoring the importance of continuity in sequence learning.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 22149

Loading