Keywords: Optimal Transport, Assignment, In-context Learning, Transformers, Sorting
Abstract: How can language models align words in translated sentences with different syntactic structures? Can they compute edit distances—or even sort arbitrary sequences? These tasks are examples of *assignment problems*. We prove a carefully engineered prompt enables a transformer to approximate the solution of assignment. This prompt induces attention layers to simulate gradient descent on the dual objective of assignment. We establish an **explicit approximation bound** that improves with transformer depth. A striking implication is that a single transformer can sort inputs of **arbitrary** length—proving a form of *out-of-distribution generalization*.
Submission Number: 2
Loading