Keywords: Large language models, inverse scaling
TL;DR: In the context of piecewise function evaluations, we show that the performance of LLMs decreases when given correct, but unrepresentative, few-shot examples and that this failure mode becomes more severe with increasing model sizes.
Abstract: We investigate whether pretrained language models (LMs) can be misled by providing them with factually correct, but unrepresentative/biased examples, in the context of integer-to-integer piecewise functions. Given the definition of a piecewise function and several examples of the function’s evaluation, we instruct LMs to apply the function to a new input. We assess LMs on two variants of this task: one where the example function evaluations are evenly distributed across both branches of the function, and one where all of the examples exercise one branch of the input and the target input exercises the other branch. We observe that model performance positively scales with model size only when examples are balanced, and that performance inversely scales with size when the examples are misrepresentative.