Persuade with Reason: Enhancing Debate Persuasiveness through Accurate Persuasion Feedback Derived from Weak Supervised Labels
Keywords: Debate Generation, Reinforce Learning
Abstract: Existing methods for debate generation often struggle to provide convincing proof, lacking critical persuasiveness. More challengingly, directly fine-tuning or using RLHF on large language models (LLMs) can decrease the persuasiveness of the generated text, making it difficult to leverage advancements from state-of-the-art LLMs. We identify two key biases underlying this issue: reward hacking and reward sparsity. Reward hacking blurs the model's training objectives, causing the model to focus more on linguistic style and rhetoric while neglecting the essential logical reasoning and value shaping. Reward sparsity reduces the generalization and robustness of the reward model. To address these two problems, we propose a novel persuasiveness enhancement training method: $\rm P^{3}$. Firstly, we introduce \underline{\textbf{P}}ersuasive reward estimation and modeling by separating persuasiveness scores from surface cues, addressing the reward hacking problem. Secondly, we solve the reward sparsity issue by employing \underline{\textbf{P}}ersuasive sample mining to extract persuasive annotation information from weakly supervised labels. Lastly, we design a new DPO algorithm tailored for \underline{\textbf{P}}ersuasiveness generation optimization, which modifying the objective function to mitigate the divergence problem on debate generation task. Extensive experimental results demonstrate that $\rm P^{3}$ effectively alleviates the aforementioned issues, significantly enhancing the model's performance in debate and persuasion tasks, surpassing state-of-the-art closed-source commercial models, such as Gemini and Claude, in both automatic and human evaluations.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 16421
Loading