Generalizing Offline Alignment Theoretical Paradigm with Diverse Divergence Constraints

Published: 17 Jun 2024, Last Modified: 02 Jul 2024ICML 2024 Workshop MHFAIA PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Offline Alignment, Human Preference-based Optimization, f-divergence
TL;DR: We generalize the offline alignment theoretical paradigm by incorporating f-divergence and develop a novel practical algorithm.
Abstract: The enhanced capabilities of large language models (LLMs) necessitate effective AI alignment. Learning from preference-based feedback has recently become popular as a promising approach to align large language models with human preference. Despite the impressive capabilities demonstrated by these aligned models across various tasks, they lack a unified theoretical framework for expression and deeper theoretical understanding. In this work, we propose the unified theoretical paradigm on human preference-based optimization, known as the Unified Preference Optimization (UPO), which can be proven as the generalization of $\Psi$PO. Through understanding of Unified Preference Optimization (UPO), we can obtain a deeper theoretical comprehension of the practical algorithms, as UPO serves as a generalization for them. Furthermore, we explore a specific scenario of UPO by simply setting the mapping to the Identity. By employing this method, we develop a novel practical algorithm, with the name of Identity Unified Preference Optimization (IUPO). It can be demonstrated that IUPO serves as a generalization of IPO under diverse divergence constraints. Our experiments comparing JS-divergence based IUPO to IPO on the fine-tuning task of GPT2 demonstrate that IUPO, particularly JS-IUPO, outperforms IPO.
Submission Number: 16
Loading