Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models

Step-by-Step Mastery: Enhancing Soft Constraint Following Ability of Large Language Models

ACL ARR 2025 February Submission6473 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: It is crucial for large language models (LLMs) to follow instructions that involve multiple constraints. It is an unexplored area to enhance LLMs' ability to follow soft constraints. To bridge the gap, we initially design a pipeline to obtain high-quality outputs automatically. Additionally, to fully utilize the positive and negative examples generated during the data construction process, we choose Direct Preference Optimization (DPO) as the training method. Furthermore, taking into account the difficulty of soft constraints indicated by the number of constraints, we design a curriculum learning training paradigm based on the constraint quantity. We experimentally evaluate the effectiveness of our methods in improving LLMs' soft constraint following ability and analyze the factors driving the improvements.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: fine-tuning,continual learning

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 6473

Loading