Pre-trained Large Language Models Learn to Predict Hidden Markov Models In-context

Yijia Dai; Zhaolin Gao; Yahya Sattar; Sarah Dean; Jennifer J. Sun

Pre-trained Large Language Models Learn to Predict Hidden Markov Models In-context

Yijia Dai, Zhaolin Gao, Yahya Sattar, Sarah Dean, Jennifer J. Sun

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: In-context learning; LLM; Hidden Markov models; Neuroscience; Decision-making; Animal behavior.

Abstract: Hidden Markov Models (HMMs) are fundamental tools for modeling sequential data with latent states that follow Markovian dynamics. However, they present significant challenges in model fitting and computational efficiency on real-world datasets. In this work, we demonstrate that pre-trained large language models (LLMs) can effectively model data generated by HMMs through in-context learning (ICL) — their ability to learn patterns from examples within the input context. We evaluate LLMs' performance on diverse synthetic HMMs, showing that their prediction accuracy converges to the theoretical optimum. We discover novel scaling trends influenced by HMM properties and provide theoretical conjectures for these empirical observations. Furthermore, we present practical guidelines for scientists on using ICL as a diagnostic tool for complex data. Applied to real-world animal decision-making tasks, ICL achieves competitive performance with models designed by human experts. Our results demonstrate potential for advancing understanding of LLMs' capabilities while opening new avenues for scientific discovery of biological mechanisms and hidden structures in real-world phenomena.

Supplementary Material: zip

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 9505

Loading