Abstract: In-context learning (ICL) is a valuable capability exhibited by Transformers pretrained on diverse sequence tasks. However, previous studies have observed that ICL often conflicts with the model’s inherent in-weight learning (IWL) ability. By examining the representation space learned by a toy model in synthetic experiments, we identify the shared encoding space for context and samples in Transformers as a potential source of this conflict. To address this, we modify the model architecture to separately encode the context and samples into two distinct spaces: a \textit{task representation space} and a \textit{sample representation space}. We model these two spaces under a simple yet principled framework, assuming a linear representational structure and treating them as a pair of dual spaces. Both theoretical analysis and empirical results demonstrate that our proposed architecture, CoQE, not only enhances ICL performance through improved representation learning, but also successfully reconciles ICL and IWL capabilities across all tested conditions.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Andrew_Kyle_Lampinen1
Submission Number: 6805
Loading