Orchestrating Human and AI Feedback: PCUI-DPO for Human-Aligned LLM Responses

Orchestrating Human and AI Feedback: PCUI-DPO for Human-Aligned LLM Responses

ACL ARR 2024 June Submission2070 Authors

15 Jun 2024 (modified: 19 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: This paper proposes a novel approach to Large Language Model (LLM) training that prioritizes AI-generated responses, reducing reliance on extensive human feedback. We introduce the Predicted Confidence and Uncertainty Index (PCUI) metric, offering a new dimension of LLM interpretability by capturing both confidence and uncertainty in generated text. Integrating PCUI into Direct Preference Optimization (DPO) guides the model towards favoring its own high-confidence responses during training. Notably, a confidence threshold is established using PCUI, enabling the model to prioritize AI-generated responses exceeding the threshold over human-provided feedback. This approach promotes a gradual shift towards automated LLM training with interpretability and control. We demonstrate the effectiveness of this method in text generation tasks, achieving significant improvements in performance. This work lays the groundwork for a future where AI and human feedback collaborate to create more robust and user-centric LLMs.

Paper Type: Short

Research Area: Human-Centered NLP

Research Area Keywords: LLM, Human-Alignment

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 2070

Loading