COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values

ACL ARR 2025 May Submission3145 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Existing Chinese preference datasets suffer from limited scale, restricted domain coverage, and insufficiently rigorous data validation. Human annotation significantly limits the scalability of human preference datasets. As a result, Chinese Alignment and Chinese Reward Models (CRM) have not yet been thoroughly explored. To address these challenges, we design an LLM-based data annotation pipeline with no human intervention. Based on it, we curate \textbf{COIG-P} (\textbf{C}hinese \textbf{O}pen \textbf{I}nstruction \textbf{G}eneralist - \textbf{P}reference), a high-quality, large-scale Chinese preference dataset, consisting of \textbf{1M} Chinese preference pairs % and \textbf{92k} carefully curated Chinese queries across diverse domains: including Chat, Coding, Maths, and others. We conduct experiments to verify the quality of COIG-P from two dimensions. (1) COIG-P brings significant performance improvements for the Qwen2/2.5 and Infinity-Instruct model series on AlignBench \cite{liu2024alignbenchbenchmarkingchinesealignment} through DPO, with gains ranging from \textbf{2\%} to \textbf{12\%}. Furthermore, it significantly outperforms other existing Chinese preference datasets. (2) We train an 8B-sized \textbf{CRM} and manually annotate a \textbf{C}hinese \textbf{R}eward \textbf{Bench}mark (\textbf{CRBench}). Our CRM has a robust scoring ability demonstrated on CRBench. Besides, in the actual data construction experiments, the quality of the data constructed by our CRM is comparable to that produced by GPT-4o.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Safety and Alignment in LLMs
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: Chinese
Submission Number: 3145
Loading