Transformers are Provably Optimal In-context Estimators for Wireless Communications
TL;DR: Transformer can provably learn to estimate in-context
Abstract: Pre-trained transformers exhibit the capability of adapting to new tasks through in-context learning (ICL), where they efficiently utilize a limited set of prompts without explicit model optimization.
The canonical communication problem of estimating transmitted symbols from received observations can be modeled as an in-context learning problem: Received observations are a noisy function of transmitted symbols, and this function can be represented by an unknown parameter whose statistics depend on an unknown latent context. This problem, which we term in-context estimation (ICE), has significantly greater complexity than the extensively studied linear regression problem.
The optimal solution to the ICE problem is a non-linear function of the underlying context. In this paper, we prove that, for a subclass of such problems, a single-layer softmax attention transformer (SAT) computes the optimal solution of the above estimation problem in the limit of large prompt length. We also prove that the optimal configuration of such a transformer is indeed the minimizer of the corresponding training loss. Further, we empirically demonstrate the proficiency of multi-layer transformers in efficiently solving broader in-context estimation problems. Through extensive simulations, we show that solving ICE problems using transformers significantly outperforms standard approaches. Moreover, just with a few context examples, it achieves the same performance as an estimator with perfect knowledge of the latent context.
Submission Number: 529
Loading