In-Context Alignment at Scale: When More is Less

Published: 10 Jun 2025, Last Modified: 30 Jun 2025MoFA PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: In Context Learning, Scaling alignment
TL;DR: In-context instructions can align LLM behavior, but as the number of new rules increases, we investigate the scaling trend to provide insights for the community to design prompts and datasets for evaluation.
Abstract: In-context instructions are a widely used and accessible method for aligning model behavior through human feedback. However, as users increasingly expect LLMs to perform multiple tasks or exhibit diverse behaviors, the number of such instructions in the prompt scales rapidly. In this work, we investigate how LLMs $\textit{scale}$ in their ability to accurately incorporate new information or rules provided purely in context---especially when such information contradicts the model’s prior beliefs or behaviors, and when the amount of such in-context information increases. We conduct experiments using controlled open-source benchmarks such as $\texttt{NewNews}$, which poses questions about hypothetical unseen news events, and we also introduce a synthetic benchmark that injects explicit rules into the prompt. These rules are designed to be easy to evaluate and must be followed by the model in order to generate the correct response. Our analysis reveals several key insights: (1) larger models generally perform better at incorporating new information, though their accuracy degrades as the number of new facts increases-- which is expected; (2) prompt depth has limited overall effect, although in tasks involving similar rules, information placed at the beginning and end of the prompt is more reliably attended to; and (3) LLMs often ``cheat'' by exploiting superficial cues, and struggle when true logical inference is required---highlighting the need for more robust evaluation protocols. These findings offer critical insight into the current limitations of in-context behavior alignment in LLMs at scale.
Submission Number: 68
Loading