MAVIS: Multi-Objective Alignment via Value-Guided Inference-Time Search

MAVIS: Multi-Objective Alignment via Value-Guided Inference-Time Search

ICLR 2026 Conference Submission15128 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Objective, RLHF, Large Language Models

TL;DR: A fine-tuning-free algorithm for aligning LLM outputs to a combination of objectives by approximating the optimal next-token distribution during LLM decoding.

Abstract: Large Language Models (LLMs) are increasingly deployed across diverse applications that demand balancing multiple, often conflicting, objectives--such as helpfulness, harmlessness, or humor. Aligning outputs to user-specific preferences in such multi-objective settings typically requires fine-tuning models for each objective or preference configuration, which is computationally expensive and inflexible. We introduce MAVIS - Multi-Objective Alignment via Value-Guided Inference-Time Search - a lightweight inference-time alignment framework that enables dynamic control over LLM behavior without modifying the base model's weights. MAVIS trains a set of small value models, each corresponding to a distinct objective. At inference time, these value models are combined using user-specified weights to produce a tilting function that adjusts the base model's output distribution toward desired trade-offs. The value models are trained using a simple iterative algorithm that ensures monotonic improvement of the KL-regularized policy. We show empirically that MAVIS outperforms baselines that fine-tune per-objective models and combine them post hoc, and even approaches the performance of the idealized setting where models are fine-tuned for a user's exact preferences.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 15128

Loading