Keywords: in-context learning, inductive biases, transformer
TL;DR: We modify the standard transformer architecture for in-context learning to allow learning the right latent variables for different tasks, leading to more interpretable solutions and allowing the possibility of making interventions.
Abstract: Transformer models have shown considerable success in modeling predictive problems in diverse domains. It has been shown that they can efficiently learn in-context, i.e., solve new tasks without any further training when provided some examples as context (ICL). While first observed in language, consequent studies show that ICL also generalizes to a variety of algorithmic tasks. Recent research highlights that transformers might be implicitly modeling the posterior predictive distribution over latent variables which are important to solve different tasks. However, ICL diverges from standard Bayesian methods as it foregoes defining explicit latent variable model and consequently doing inference on it, in favor of an implicit mechanism. This begs a natural question: is there any benefit in explicitly factorizing knowledge or are we better off letting the model implicitly decide an appropriate solution space. We conduct a thorough analysis to uncover both the advantages and limitations of the current ICL setting and show that models which explicitly factorize knowledge can be more readily augmented with inductive biases which significantly boosts performance when domain knowledge is present.
Submission Number: 61
Loading