Does learning the right latent variables necessarily improve in-context learning?

Sarthak Mittal; Eric Elmoznino; Leo Gagnon; Sangnie Bhardwaj; Guillaume Lajoie; Dhanya Sridhar

Does learning the right latent variables necessarily improve in-context learning?

Sarthak Mittal, Eric Elmoznino, Leo Gagnon, Sangnie Bhardwaj, Guillaume Lajoie, Dhanya Sridhar

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We test whether explicit latent-variable inference leads to better performance in in-context learning and identify the benefits and shortcomings of such methods.

Abstract: Large autoregressive models like Transformers can solve tasks through in-context learning (ICL) without learning new weights, suggesting avenues for efficiently solving new tasks. For many tasks, e.g., linear regression, the data factorizes: examples are independent given a task latent that generates the data, e.g., linear coefficients. While an optimal predictor leverages this factorization by inferring task latents, it is unclear if Transformers implicitly do so or instead exploit heuristics and statistical shortcuts through attention layers. In this paper, we systematically investigate the effect of explicitly inferring task latents by minimally modifying the Transformer architecture with a bottleneck to prevent shortcuts and incentivize structured solutions. We compare it against standard Transformers across various ICL tasks and find that contrary to intuition and recent works, there is little discernible difference between the two; biasing towards task-relevant latent variables does not lead to better out-of-distribution performance, in general. Curiously, we find that while the bottleneck effectively learns to extract latent task variables from context, downstream processing struggles to utilize them for robust prediction. Our study highlights the intrinsic limitations of Transformers in achieving structured ICL solutions that generalize, and shows that while inferring the right latents aids interpretability, it is not sufficient to alleviate this problem.

Lay Summary: Large-scale machine learning models can learn to solve new problems just by looking at a few examples—this is called in-context learning. In this work, we explore whether it’s better to first clearly figure out the hidden rules behind a task and then solve it, or if it’s just as effective to let the model pick up those rules on its own, without making them explicit.

Primary Area: General Machine Learning->Transfer, Multitask and Meta-learning

Keywords: in-context learning, transformers, attention, latent variable, shortcuts

Submission Number: 7540

Loading