HT-Transformer: Event Sequences Classification by Accumulating Prefix Information with History Tokens

HT-Transformer: Event Sequences Classification by Accumulating Prefix Information with History Tokens

ICLR 2026 Conference Submission7456 Authors

16 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Event Sequences, Classification, Transformer, Embedding Extraction

TL;DR: We propose the concept of history tokens for unsupervised embedding pretraining in Transformer models using a next-token prediction objective.

Abstract: Deep learning has achieved strong results in modeling sequential data, including event sequences, temporal point processes, and irregular time series. Recently, transformers have largely replaced recurrent networks in these tasks. However, transformers often underperform RNNs in sequence classification tasks that aim to predict future targets. The reason behind this performance gap remains largely unexplored. In this paper, we identify a key limitation of transformers: the absence of a single state vector that provides a compact and effective representation of the entire sequence. Additionally, we show that contrastive pretraining of embedding vectors fails to capture local context, which is crucial for accurate prediction. To address these challenges, we introduce history tokens, a novel concept that facilitates accumulating historical information during next-token prediction pretraining. Our approach significantly improves transformer-based models, achieving impressive results in finance, e-commerce, and healthcare tasks.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 7456

Loading