Parser-Validated Minimal-Pair Preferences for DPO: A Case Study in Legal "Empathy"

Parser-Validated Minimal-Pair Preferences for DPO: A Case Study in Legal "Empathy"

ACL ARR 2026 January Submission6206 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Alignment, Direct Preference Optimization, Linguistic Empathy, Syntactic Minimal Pairs, Legal Question Answering

Abstract: Aligning LLMs to produce responses perceived as empathetic typically relies on costly human preference data and offers limited insight into which linguistic cues drive those preferences. We study legal question answering and introduce an automatic preference-data pipeline based on parser-validated syntactic minimal pairs. Grounded in linguistic accounts of perspective-taking, we generate rule-labeled minimal pairs along five dimensions (pronouns, voice, tense, polite imperatives, evaluative adverbs) and validate the intended contrast with dependency parsing (82.7% success). From 1,785 questions, we produce 7,378 minimal pairs and fine-tune three 7--8B model families (LLaMA-3, Mistral, Gemma) with DPO. Human evaluation (3,309 judgments, 35 raters) prefers DPO over an SFT baseline (68.8% vs. 31.2\%, $p<0.001$, $h$=0.40), robust to length controls. Feature validation shows voice is the dominant above-chance contributor (80%, $h$=0.64), while other edits are register-sensitive. Overall, parser-validated minimal pairs provide an interpretable, scalable route to preference optimization and identify which cues align with human judgments in-domain.

Paper Type: Long

Research Area: Safety and Alignment in LLMs

Research Area Keywords: Alignment, Direct Preference Optimization, Interpretability, Legal NLP, Data Generation/Augmentation

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Submission Number: 6206

Loading