Stand with the Truth: Stimulate Critical Thinking in LLMs via De-biasing Discussion Training

Stand with the Truth: Stimulate Critical Thinking in LLMs via De-biasing Discussion Training

ACL ARR 2025 February Submission2342 Authors

14 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) often succumb to users' viewpoints when faced with conflicting perspectives. We identify two key biases underlying this issue : stance homogeneity bias and human preference bias. To address these biases, we propose a novel two-stage training framework: Multi-stance Discussion Sampling and Truth Alignment Training (MDTA). First, we introduce an equal multi-stance discussion framework to automatically generate multi-model discussion datasets. Based on this framework, we construct the first and largest multi-model fair discussion dataset named Eq-Discussion for supervised fine-tuning, reducing stance homogeneity bias. Second, we optimize Reinforcement Learning from Human Feedback (RLHF) to align with discussion correctness, mitigating human preference bias. Extensive experimental results demonstrate that MDTA effectively reduces both biases and significantly enhances the performance of LLMs across a variety of downstream tasks, including reading comprehension, logical reasoning, and social question answering. Furthermore, we observe that MDTA improves the generalization capabilities of LLMs, leading to substantial performance improvements in non-discussion scenarios and on out-of-domain datasets.

Paper Type: Long

Research Area: Language Modeling

Research Area Keywords: fine-tuning,reinforcement learning,model bias/unfairness mitigation,commonsense QA,reading comprehension,logical reasoning

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources

Languages Studied: English

Submission Number: 2342

Loading