The N Implementation Details of RLHF with PPO

Shengyi Huang; Tianlin Liu; Leandro Von Werra

The N Implementation Details of RLHF with PPO

Shengyi Huang, Tianlin Liu, Leandro Von Werra

Published: 16 Feb 2024, Last Modified: 28 Mar 2024BT@ICLR2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement learning from human feedback

Blogpost Url: https://iclr-blogposts.github.io/2024/blog/the-n-implementation-details-of-rlhf-with-ppo/

Abstract: Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique for training modern language models such as ChatGPT. In this blog post, we explore OpenAI's first RLHF paper from 2019 and its accompanying open-source codebase, available at https://github.com/openai/lm-human-preferences. Our examination shows important implementation details of RLHF that are often overlooked. Moreover, we illustrate how to replicate OpenAI's original Tensorflow 1.0 implementation using contemporary PyTorch and JAX frameworks, offering a minimal reference implementation for RLHF.

Ref Papers: https://arxiv.org/abs/1909.08593

Id Of The Authors Of The Papers: ~Daniel_M_Ziegler1

Conflict Of Interest: None.

Submission Number: 2

Loading