【Proposal】Fine-Grained Reward Modeling in LLMs: An RL, PRM, and Memory-Augmented Approach for Advanced Reasoning

20 Oct 2024 (modified: 05 Nov 2024)THU 2024 Fall AML SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, LLM, Working Memory, PRM, DNCs
TL;DR: We propose an LLM alignment method using reinforcement learning, pretrained reward models, and memory for enhanced reasoning and adaptability. It reduces manual labeling through automated LLM feedback and improves performance on complex tasks.
Abstract: This study proposes an advanced framework for large language model (LLM) alignment, integrating Reinforcement Learning (RL), a Process Reward Model (PRM), and a dynamic memory mechanism. Unlike traditional RLHF approaches limited to basic reward criteria (e.g., “usefulness” and “toxicity”), our model incorporates fine-grained evaluation metrics like “contextual coherence” and “logical consistency.” By leveraging another LLM for automated feedback and implementing a gated memory system, the model adapts to multi-step tasks efficiently. This architecture offers improved scalability and accuracy, reducing reliance on manual labeling and enhancing inference quality.
Submission Number: 12
Loading