V-RoLoRA: RLVR-Driven MoE Routing for Steerable Pluralistic Alignment

V-RoLoRA: RLVR-Driven MoE Routing for Steerable Pluralistic Alignment

ACL ARR 2026 January Submission4725 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Steerable Pluralistic Alignment, PEFT, MOE, LoRA, RLVR, GRPO

Abstract: Steerable pluralistic alignment aims to enable large language models (LLMs) to reliably adhere to diverse and potentially conflicting human values, particularly when target objectives involve multi-dimensional, compositional values. Current methods largely rely on prompt engineering or reasoning-time guidance, which often results in fragile and non‑persistent control once prompts are perturbed or omitted. In this work, we study value‑controllable alignment through discrete condition vectors and propose Verifiable‑reward‑Routed LoRA—a parameter‑efficient mixture‑of‑experts LoRA framework enhanced with conditioned gating. This gating mechanism dynamically directs the flow among multiple LoRA experts based on an input value or moral vector. To ensure that such routing leads to semantically compliant outputs, we formulate post‑training as a reinforcement learning problem with verifiable rewards. We further introduce a conditional consistency reward, computed by an external model‑based verifier implemented as a lightweight discriminator, and optimize the adapter parameters using GRPO. Experiments on the Touché23‑valueEval (value alignment) and MIC (moral alignment) benchmarks, using two 8‑billion‑parameter backbones, show that our method consistently outperforms prompt‑based steering and multi‑task PEFT baselines. It attains the highest overall controllability across micro‑F1, macro‑F1, and Jaccard metrics—a conclusion further reinforced by human pairwise evaluations.

Paper Type: Long

Research Area: Language Models

Research Area Keywords: fine-tuning,prompting,safety and alignment

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Data resources, Data analysis

Languages Studied: English

Submission Number: 4725

Loading