Align and Adapt: Enhancing LLM Format Alignment and Knowledge Adaptation via Reverse Constraints Generation
Keywords: Instruction Tuning, Complex Instruction Following, Long-form QA, Format Alignment, Back Translation, Data Augmentation
Abstract: Building effective LLM agents requires strong instruction-following capability in addition to domain knowledge. While human-annotated long-form QA (LFQA) datasets contain rich factual content, we find that directly fine-tuning on them degrades instruction-following performance, making it impractical to create domain-specific agents. Recent research on instruction-tuning has focused on augmenting existing instruction-tuning or conversational datasets to create complex instruction-tuning dataset, enabling LLMs to better handle fine-grained and nuanced instructions. While effective, these augmentation approaches risk distorting semantic meaning of the long-form QA datasets. We propose REFER (REstructure, Feature Extract, Reverse constraint generation), a framework that transforms human-annotated long-form QA datasets into high-quality instruction-tuning datasets focused on verifiable constraints. REFER preserves the original semantics while integrates fine-grained format constraints into the dataset, enabling LLMs to improve instruction-following capability without sacrificing domain knowledge. Extensive evaluations on instruction-following benchmarks show that LLaMA-2-7B models fine-tuned with REFER exhibit stronger generalization in complex and multi-turn instruction following compared to both standard instruction-tuning and direct LFQA fine-tuning. REFER also emphasizes security and efficient where all the data augmentation is performed without external APIs, and supervised fine-tuning uses lightweight, reproducible LoRA adapters. Our results demonstrate that REFER enables the practical creation of domain-specific LLM agents with enhanced instruction-following capability which is something unattainable with naive LFQA fine-tuning.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 10975
Loading