Recast Your Input via a Mapping Function for Alignment

ICLR 2026 Conference Submission22338 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: alignment, input refiner, CMA-ES, posterior regularization
Abstract: Alignment is promoting its critical role among the large language model (LLM) scenarios, which ensures safety, controllability, and trustworthiness of the generation. The popular alignment methods, that is, reinforcement learning from human feedback (RLHF), direct preference optimization (DPO) and such series, usually change weights of the model by elaborate algorithm. Nevertheless, they suffer from the compute drain for training, especially when the parameters' size getting huge. Worse still, people typically do not have access to the weights of the SOTA models, such as GPT-4, which consequently renders the aforementioned algorithms unimplementable. In this paper, we propose to employ a separate LM as the Refiner, an input mapping function essentially, to transform the original query into a novel formulation that impels the final generation to align with the expectations. During optimization, an evolution strategy, namely CMA-ES, is leveraged to fine-tune the LM with linkage to the generation model. We conduct extensive experiments on various refiner and generation types, and achieving surpassing results.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 22338
Loading