Reward Model Aggregation

Published: 28 Oct 2023, Last Modified: 26 Nov 2023Instruction Workshop @ NeurIPS 2023EveryoneRevisionsBibTeX
Keywords: LLM alignment, reward aggregation
Abstract: Aligning language models requires guiding outputs towards desired properties using reward models. This paper tackles the challenge of combining multiple reward models for diverse objectives. We introduce methods for aggregating these rewards using logical operations. Experiments confirm our methods beat traditional aggregation techniques and underscore the significance of proper reference values.
Submission Number: 68