Zephyr: Direct Distillation of LM Alignment

Lewis Tunstall; Edward Emanuel Beeching; Nathan Lambert; Nazneen Rajani; Kashif Rasul; Younes Belkada; Shengyi Huang; Leandro Von Werra; Clémentine Fourrier; Nathan Habib; Nathan Sarrazin; Omar Sanseviero; Alexander M Rush; Thomas Wolf

Zephyr: Direct Distillation of LM Alignment

Lewis Tunstall, Edward Emanuel Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro Von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M Rush, Thomas Wolf

Published: 10 Jul 2024, Last Modified: 26 Aug 2024COLMEveryoneRevisionsBibTeXCC BY 4.0

Research Area: Alignment

Keywords: AI feedback, open LLMs, RLHF

TL;DR: A method for aligning open language models with AI preference feedback at scale.

Abstract: We aim to produce a smaller language model that is aligned to user intent. Previous research has shown that applying distilled supervised fine-tuning (dSFT) on larger models significantly improves task accuracy; however, these models are unaligned, i.e. they do not respond well to natural prompts. To distill this property, we experiment with the use of preference data from AI Feedback (AIF). Starting from a dataset of outputs ranked by a teacher model, we apply distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment. The approach requires only a few hours of training without any additional sampling during fine-tuning. The final result, Zephyr-7B, set a new state-of-the-art on chat benchmarks for 7B parameter models, and requires no human annotation. In particular, results on MT-Bench show that Zephyr-7B surpassed Llama2-Chat-70B, at the time the best open-access RLHF-based model.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html

Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html

Submission Number: 1089

Loading