Instruction Bootstrapped Preference Optimization: Improving Model Alignment with a Better Instruction
Abstract: Instruction tuning and preference alignment have played pivotal roles in recent advances in large language models (LLMs).
Empirical observations reveal that when provided with bootstrapping instructions such as "please generate a better response" following initial outputs, these models can produce significantly enhanced subsequent responses.
This finding highlights the critical role of both initial outputs and bootstrapping instructions in preference alignment, while also suggesting the important connection between abstract preference definitions and their concrete textual expressions.
Based on this insight, we propose Instruction Bootstrapped Preference Optimization (IBPO), an innovative approach to refine instruction fine-tuning, preference optimization, and inference steps in LLMs in the form of plugins.
IBPO systematically integrates paired preference data with bootstrapping instructions into unified sequences, enabling more effective utilization of preference data while strengthening the association between textual expressions in preference data and preference descriptions in the instruction.
Experiments on multiple datasets demonstrate that IBPO achieves more than 10% improvements over several existing preference alignment baselines. Ablation experiments and mechanistic analysis provide potential explanations for these improvements.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: generative models, contrastive learning
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 346
Loading