Sequence-level Large Language Model Training with Contrastive Preference OptimizationDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
Abstract: The next token prediction loss is the dominant self-supervised training objective for large language models and has achieved promising results across a variety of downstream tasks. However, upon closer investigation of this objective, we find that it lacks an understanding of sequence-level signals, leading to a mismatch between the training and inference processes. To bridge this gap, we introduce a contrastive preference optimization procedure that can inject sequence-level signals into the language model at any training stage without expensive human labeling. Notably, our experiments revealed that the proposed objective surpasses the next-token prediction in terms of GPT winning rate on both instruction-following and text generation. Specifically, using OpenLlama-3B, our method achieves a $13\%$ improvement on an instruction-following task, and a $3\%$ increase on a text generation task.
Paper Type: long
Research Area: Machine Learning for NLP
Contribution Types: NLP engineering experiment
Languages Studied: English
0 Replies

Loading