Where Does In-context Machine Translation Happen in Large Language Models?

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: visualization or interpretation of learned representations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: In-context Machine Translation, Interpretability
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Through a series of layer-wise, context/positional and L0 masking experiments, we attempt to locate where In-context MT happens for LLMs.
Abstract: Self-supervised large language models have demonstrated the ability to perform Machine Translation (MT) via in-context learning, but little is known about where the model performs MT with respect to prompt instructions and demonstration examples. In this work, we attempt to characterize the region in layer-wise attention heads where GPT models transition from in-context learners to translation models. Through a series of layer-wise context-masking experiments on GPTNeo2.7B and Bloom3B, we demonstrate evidence of a "task recognition" point where the translation task is encoded into the input representations and attention to context is no longer necessary. Our layer-wise fine-tuning experiments indicate that the most effective layers for MT fine-tuning are the layers critical to task recognition. Next, we examine redundancy in layers following task recognition, observing that masking these later layers does not hurt performance significantly. Finally, we train discrete attention head gates with $L_0$ regularisation and find evidence that the most pruneable heads occur after task recognition.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7728
Loading