How does representation impact in-context learning: An exploration on a synthetic task

17 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: In-context learning; representation learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: In-context learning, i.e., learning from in-context samples, is an impressive ability of Transformer. However, the exact mechanism behind this learning process remains unclear. In this study, we aim to explore this aspect from a relatively unexplored perspective of representation learning. In the context of in-context learning scenarios, the representation becomes more complex as it can be influenced by both model weights and in-context samples. We refer to these two conceptual aspects of representation as the in-weight component and the in-context component, respectively. To examine the impact of these two components on in-context learning capabilities, we create a novel synthetic task, which allows us to develop two probes - the in-weights probe and the in-context probe - to evaluate the respective components. Our findings reveal that the quality of the in-context component is closely related to in-context learning performance, signifying the connection between in-context learning and representation learning. Additionally, we discover that a well-developed in-weights component can positively affect the learning of the in-context component, suggesting that in-weights learning should serve as the foundation for in-context learning. To gain a deeper understanding of the in-context learning mechanism and the importance of the in-weights component, we demonstrate through construction that a simple Transformer, using pattern matching and a copy-paste mechanism for in-context learning, can achieve comparable performance to a more complex, best-tuned Transformer under the assumption of a perfect in-weights component. Overall, our discoveries from the perspective of representation learning provide valuable insights into new approaches for enhancing in-context capacity.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 808
Loading