Keywords: large language models, alignment, in-context learning
TL;DR: We study changes in response diversity after alignment and conclude the effects of current alignment methods are well described as a shift from distributional to Overton pluralism.
Abstract: The alignment process changes several properties of a large language model's (LLM's) output distribution. In this work, we re-examine previously reported reductions in response diversity post-alignment in open-ended QA. Our analysis suggests that an apparent drop in the diversity of responses is largely explained by quality control and information aggregation. Both fine-tuning and prompting based alignment techniques suppress irrelevant and unhelpful content while shifting the output distribution toward longer responses covering multiple samples from the base LLM, essentially presenting diverse information in a single response. We argue these changes are well characterized as a shift from distributional pluralism to Overton pluralism, rather than an overall reduction in response diversity, and highlight the need for decoupled measures of semantic and lexical diversity.
Submission Number: 35
Loading