ICL CIPHERS: Quantifying ``Learning'' in In-Context Learning via Substitution Ciphers

ACL ARR 2024 December Submission1535 Authors

16 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recent works have suggested that In-Context Learning (ICL) operates in dual modes, i.e. task retrieval (remember learned patterns from pre-training) and task learning (inference-time ``learning" from demonstrations). However, disentangling these the two modes remains a challenging goal. We introduce ICL ciphers, a class of task reformulations based on substitution ciphers borrowed from classic cryptography. In this approach, a subset of tokens in the in-context inputs are substituted with other (irrelevant) tokens, rendering English sentences less comprehensible to human eye. However, by design, there is a latent, fixed pattern to this substitution, making it reversible. This bijective (reversible) cipher ensures that the task remains a well-defined task in some abstract sense, despite the transformations. It is a curious question if LLMs are capable of solving ICL ciphers with a bijective mapping, which requires deciphering the latent cipher. We show that LLMs are better at solving ICL ciphers with Bijective mappings than the Non-Bijective (irreversible) baseline, providing a novel approach to quantify ``learning'' in ICL. While this gap is small, it is consistent across the board on four datasets and four models families. Finally, we examine LLMs' internal representations and identify evidence in their ability to decode the ciphered inputs.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: few-shot learning
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
Submission Number: 1535
Loading