Keywords: LLM Alignment, Direct Preference Optimization, Linguistic Empathy, Syntactic Minimal Pairs, Legal Question Answering
Abstract: Aligning LLMs to produce responses perceived as empathetic typically relies on costly human preference data and offers limited insight into which linguistic cues drive those preferences. We study legal question answering and introduce an automatic preference-data pipeline based on parser-validated syntactic minimal pairs. Grounded in linguistic accounts of perspective-taking, we generate rule-labeled minimal pairs along five dimensions (pronouns, voice, tense, polite imperatives, evaluative adverbs) and validate the intended contrast with dependency parsing (82.7% success). From 1,785 questions, we produce 7,378 minimal pairs and fine-tune three 7--8B model families (LLaMA-3, Mistral, Gemma) with DPO. Human evaluation (3,309 judgments, 35 raters) prefers DPO over an SFT baseline (68.8% vs. 31.2\%, $p<0.001$, $h$=0.40), robust to length controls. Feature validation shows voice is the dominant above-chance contributor (80%, $h$=0.64), while other edits are register-sensitive. Overall, parser-validated minimal pairs provide an interpretable, scalable route to preference optimization and identify which cues align with human judgments in-domain.
Paper Type: Long
Research Area: Safety and Alignment in LLMs
Research Area Keywords: Alignment, Direct Preference Optimization, Interpretability, Legal NLP, Data Generation/Augmentation
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 6206
Loading