Attributing Mode Collapse in the Fine-Tuning of Large Language Models

Attributing Mode Collapse in the Fine-Tuning of Large Language Models

ICLR 2024 Workshop ME-FoMo Submission30 Authors

Published: 04 Mar 2024, Last Modified: 05 May 2024ME-FoMo 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language models, mode collapse

TL;DR: Analysis the the token and output diversity in pretrained and fine-tuned LLMs.

Abstract: Large language models (LLMs) are typically trained in two stages: first, pre-training on a large, diverse dataset for general-purpose language modeling capabilities, followed by a fine-tuning stage (often called “instruction tuning” or “alignment”) on smaller, more curated datasets to adapt them to a specific task or downstream application, such as chat, or general instruction-following. It is a well-known anecdotal observation that instruction-tuned models have less output diversity, such as the infamous observation that ChatGPT cannot seem to generate more than a handful of jokes. A low output diversity means a model lacks the ability to generate varied outputs, which can be a limitation for many use cases. In this manuscript, we quantify how each step in a typical RLHF or instruction-tuning pipeline changes a model’s diversity, for a series of models trained in a controlled fine-tuning setup and compare these models to some open-weight models. We distinguish between two categories of diversity in LLMs: token-level prediction diversity, and model output generation diversity. We find that the supervised fine-tuning and reward-based fine-tuning steps have different effects on these distinct diversity types. Our results have implications for better understanding the effects of instruction tuning on the diversity of language models.

Submission Number: 30

Loading