On Limitation of Transformer for Learning HMMs

Jiachen Hu; Qinghua Liu; Chi Jin

On Limitation of Transformer for Learning HMMs

Jiachen Hu, Qinghua Liu, Chi Jin

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Transformers, RNN, HMM, representation learning, expressive power

Abstract: This paper investigate the capability of transformer in learning a fundamental sequential model --- the Hidden Markov Model (HMM). We design various types of HMM examples and variants inspired by theory, and conduct extensive experiments testing and comparing the performance of both transformers and Recurrent Neural Networks (RNNs). Our experiments reveal three important findings: (1) Transformers can effectively learn a large number of HMMs, but this require the depth of transformers to be at least logarithmic in the sequence length; (2) There are challenging HMMs where Transformers struggle to learn, while RNNs succeed. We also consistently observe that Transformers underperform RNNs in both training speed and testing accuracy across all tested HMM models. (3) Long mixing times and the lack of access to intermediate latent states significantly degrade Transformer's performance, but has much less impact on RNNs' performance. To address the limitation of transformers in modeling HMMs, we demonstrate that a variant of the Chain-of-Thought (CoT), called \emph{block CoT} in the training phase, can help transformers to reduce the evaluation error and to learn longer sequences at a cost of increasing the training time. Finally, we complement our empirical findings by theoretical results proving the expressiveness of transformers in approximating HMMs with logarithmic depth.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10495

Loading