Learning Fast and Slow: Representations for In-Context Weight Modulation

Published: 18 Jun 2024, Last Modified: 02 Jul 2024ICML 2024 Workshop ICL PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 8 pages)
Keywords: transformers, representation learning, in-context learning, weight generation, specialization, variational autoencoders
Abstract: Most natural sequential processes involve a spectrum of different time scales: from fast-changing variations responsible for local structure to slowly-changing dynamics akin to memory that captures context information. Here we propose a method for learning such disentangled slow-fast representation in activations of a conventional Transformer model. We accomplish this by employing regularization techniques inspired by contrastive learning. This proposed approach can be further analyzed by adopting a Gaussian process prior resulting in a Variational Autoencoder interpretation of a Transformer model. We evaluate our techniques on synthetic in-context learning tasks and widely-used text benchmarks, where we show the emergence of disentangled representations. We then propose a HyperNetwork-inspired approach, where the slow representations are employed to modulate the weights of the transformer performed on the fast short-range activations. We demonstrate that adding such modulation makes it possible to generate models specialized to a particular context on the fly.
Submission Number: 25
Loading