Inference time LLM alignment in single and multidomain preference spectrum

sadat shahriar; Zheng Qi; Nikolaos Pappas; Srikanth Doss; Kishaloy Halder; MONICA SUNKARA; Manuel Mager; Yassine Benajiba

Inference time LLM alignment in single and multidomain preference spectrum

sadat shahriar, Zheng Qi, Nikolaos Pappas, Srikanth Doss, Kishaloy Halder, MONICA SUNKARA, Manuel Mager, Yassine Benajiba

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, Alignment, inference

TL;DR: This paper presents Alignment Vectors (AV), a new method for dynamically aligning LLMs during inference, enabling customizable outputs while reducing cost.

Abstract: Aligning Large Language Models (LLM) to address subjectivity and nuanced preference levels requires adequate flexibility and control, which can be a resource-intensive and time-consuming procedure. Existing training-time alignment methods require full re-training when a change is needed and inference-time ones typically require access to the reward model at each inference step. To address these limitations, we introduce an inference-time model alignment method that learns encoded representations of preference dimensions, called Alignment Vectors (AV). These representations are computed by subtracting the base model from the aligned model as in model editing enabling dynamically adjusting the model behavior during inference through simple linear operations. Even though the preference dimensions can span various granularity levels, here we focus on three gradual response levels across three specialized domains: medical, legal, and financial, exemplifying its practical potential. This new alignment paradigm introduces adjustable preference knobs during inference, allowing users to tailor their LLM outputs while reducing the inference cost by half compared to the prompt engineering approach. Additionally, we find that AVs are transferable across different fine-tuning stages of the same model, demonstrating their flexibility. AVs also facilitate multidomain, diverse preference alignment, making the process 12x faster than the retraining approach.

Primary Area: foundation or frontier models, including LLMs

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7739

Loading