Align and Adapt: Enhancing LLM Format Alignment and Knowledge Adaptation via Reverse Constraints Generation

Yi Hao Woon; Hung-Yu Kao

Align and Adapt: Enhancing LLM Format Alignment and Knowledge Adaptation via Reverse Constraints Generation

Yi Hao Woon, Hung-Yu Kao

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Instruction Tuning, Complex Instruction Following, Long-form QA, Format Alignment, Back Translation, Data Augmentation

Abstract: Building effective LLM agents requires strong instruction-following capability in addition to domain knowledge. While human-annotated long-form QA (LFQA) datasets contain rich factual content, we find that directly fine-tuning on them degrades instruction-following performance, making it impractical to create domain-specific agents. Recent research on instruction-tuning has focused on augmenting existing instruction-tuning or conversational datasets to create complex instruction-tuning dataset, enabling LLMs to better handle fine-grained and nuanced instructions. While effective, these augmentation approaches risk distorting semantic meaning of the long-form QA datasets. We propose REFER (REstructure, Feature Extract, Reverse constraint generation), a framework that transforms human-annotated long-form QA datasets into high-quality instruction-tuning datasets focused on verifiable constraints. REFER preserves the original semantics while integrates fine-grained format constraints into the dataset, enabling LLMs to improve instruction-following capability without sacrificing domain knowledge. Extensive evaluations on instruction-following benchmarks show that LLaMA-2-7B models fine-tuned with REFER exhibit stronger generalization in complex and multi-turn instruction following compared to both standard instruction-tuning and direct LFQA fine-tuning. REFER also emphasizes security and efficient where all the data augmentation is performed without external APIs, and supervised fine-tuning uses lightweight, reproducible LoRA adapters. Our results demonstrate that REFER enables the practical creation of domain-specific LLM agents with enhanced instruction-following capability which is something unattainable with naive LFQA fine-tuning.

Supplementary Material: zip

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 10975

Loading