Hierarchical Alignment: Surgical Fine-Tuning via Functional Layer Specialization in Large Language Models

ACL ARR 2026 January Submission7179 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Hierarchical Alignment, alignment tax, Functional Layer
Abstract: While standard Direct Preference Optimization (DPO) treats Large Language Models as monolithic blocks, we propose Hierarchical Alignment, a surgical fine-tuning framework that leverages the functional specialization inherent in Transformer architectures by selectively optimizing distinct layer blocks: shallow (Local-Align), middle (Mid-Align), and deep (Global-Align). Through extensive evaluation across four state-of-the-art model families (Llama-2/3.1 and Qwen-2.5/3) using a rigorous 16-dimensional "LLM-as-Judge" protocol, we demonstrate that Mid-Align consistently matches or exceeds the performance of full-parameter DPO despite a significant reduction in updated parameters, identifying the middle layers as the critical nexus for semantic coherence and knowledge integration. Our findings reveal a fundamental "bottom-up" representational dependency—where late-layer updates alone prove insufficient for behavioral alignment—and establish that hierarchical strategies induce predictable, dimension-specific shifts in model behavior, ultimately advocating for a transition toward architecture-aware alignment as a more efficient, interpretable, and controllable paradigm for shaping intelligent systems.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Hierarchical Alignment, alignment tax,Functional Layer
Contribution Types: Model analysis & interpretability
Languages Studied: english
Submission Number: 7179
Loading