Position: Alignment as a Rank-Routed Adapter Problem — Rule-Class Decomposition Before Low-Rank Preference Learning

09 May 2026 (modified: 09 May 2026)ICML 2026 Workshop CoLoRAI SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: alignment, low-rank adapters, LoRA, rule-class routing, preference learning, social choice, position paper
TL;DR: Alignment compresses contested rules into one global policy; rule classes (publicly-justified, contested, rights-protective) should route through architecturally separate low-rank adapters before preference learning.
Abstract: This position paper argues that alignment pipelines should classify rules before they aggregate preferences, and that this classification step is naturally implemented as a low-rank adapter routing layer. Current methods — RLHF, Constitutional AI, DPO, RLAIF, Deliberative Alignment — compress contested normative input into a single global policy via scalar reward. Social-choice theory (Arrow, Gibbard-Satterthwaite, Sen) establishes that aggregation over unrestricted value disagreement has no procedure satisfying minimal democratic, strategic, and liberty-preserving conditions. Existing low-rank methods (LoRA and its successors) already give the alignment community the mechanical infrastructure needed for class-conditional behavioral specialization without re-training the base model; what is missing is a principled routing layer above them. We propose a three-class decomposition. Class I rules (publicly-justified prohibitions: CSAM, WMD synthesis) admit aggregation and are correctly addressed by the base model's preference-learned weights. Class II rules (reasonable-disagreement domains) should be routed to community-specific low-rank adapters with disclosed defaults. Class III rules (rights-protective constraints) should be implemented as non-aggregative low-rank constraint heads with an appeal log. We give a six-step routing procedure, a five-item disclosure checklist, and argue that the low-rank-representations community is uniquely positioned to provide the architectural primitives for this decomposition. Because LoRA-style adapters are already the dominant low-rank mechanism for behavioral specialization, the technical infrastructure exists; what alignment lacks is the rank-decomposition discipline of deciding which behavior class is routed to which low-rank component.
Submission Number: 145
Loading