Abstract: It is crucial for large language models (LLMs) to follow instructions that involve multiple constraints. It is an unexplored area to enhance LLMs' ability to follow soft constraints. To bridge the gap, we initially design a pipeline to obtain high-quality outputs automatically. Additionally, to fully utilize the positive and negative examples generated during the data construction process, we choose Direct Preference Optimization (DPO) as the training method. Furthermore, taking into account the difficulty of soft constraints indicated by the number of constraints, we design a curriculum learning training paradigm based on the constraint quantity. We experimentally evaluate the effectiveness of our methods in improving LLMs' soft constraint following ability and analyze the factors driving the improvements.
Paper Type: Long
Research Area: Language Modeling
Research Area Keywords: fine-tuning,continual learning
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 6473
Loading