Transformers Can Perform Distributionally-robust Optimisation through In-context Learning

Published: 18 Jun 2024, Last Modified: 11 Jul 2024ICML 2024 Workshop ICL PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 8 pages)
Keywords: In Context Learning, Distributionally Robust Optimisation
TL;DR: We empirically show that transformer can perform distributionally robust optimisation.
Abstract: Recent empirical and theoretical studies have shown that through in-context learning, transformers can solve various simple machine-learning problems such as linear regression and decision-forest prediction. We extend this line of research on analysing the power of transformers' in-context learning. We experimentally show that even in the presence of multiple types of perturbations, transformers can in-context learn a range of function classes. This means that transformers can perform the distributionally-robust optimisation (DRO) for those function classes when trained with appropriate in-context learning tasks. Our experiments include problems studied in the DRO community, which consider a single type of perturbations specified in terms of either total-variation distance or Wasserstein distance, or the combination of multiple types of perturbations. Our experimental findings show that transformers can solve the DRO problems in all these cases. They also show that while standard algorithms for DRO are usually limited to linear models, through in-context learning, transformers can do DRO for non-linear models, such as kernel regression models and shallow neural networks.
Submission Number: 36
Loading