COIG-Writer: A High-Quality Chinese Creative Writing with Thought Process Dataset

Yunwen Li; Shuangshuang Ying; Xingwei Qu; Xin Li; Sheng Jin; Minghao Liu; Zhoufutu Wen; Tianyu Zheng; Xeron Du; Qiguang Chen; Jiajun Shi; Wangchunshu Zhou; Jiazhan Feng; Wanjun Zhong; Libo Qin; Ge Zhang; Wenhao Huang; Wanxiang Che; Chenghua Lin

COIG-Writer: A High-Quality Chinese Creative Writing with Thought Process Dataset

Yunwen Li, Shuangshuang Ying, Xingwei Qu, Xin Li, Sheng Jin, Minghao Liu, Zhoufutu Wen, Tianyu Zheng, Xeron Du, Qiguang Chen, Jiajun Shi, Wangchunshu Zhou, Jiazhan Feng, Wanjun Zhong, Libo Qin, Ge Zhang, Wenhao Huang, Wanxiang Che, Chenghua Lin

17 Sept 2025 (modified: 17 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, Chinese Creative Writing, Process-Augmented Data

TL;DR: We introduce a novel Chinese creative writing dataset that incorporates authors' genuine thought processes to train AI models to generate content with greater depth and personality.

Abstract: Large language models exhibit systematic deficiencies in creative writing, particularly in non-English contexts where training data is scarce and lacks process-level supervision. We present COIG-Writer, a novel Chinese creative writing dataset that captures both diverse outputs and their underlying thought processes through systematic reverse-engineering of high-quality texts. Unlike existing datasets that provide only input-output pairs, COIG-Writer comprises 1,665 meticulously curated triplets spanning 51 genres, each containing: (1) a reverse-engineered prompt, (2) detailed creative reasoning documenting decision-making processes, and (3) the final text. Through comprehensive experiments, we identify a two-component model of creative writing: narrative logic (provided by process supervision) and linguistic expression (maintained by general-purpose data). Our findings reveal three critical insights: (1) process supervision requires at least 10k general samples for stabilization—below this threshold, performance degrades monotonically (35.78% → 42.16% → 50.00% → 62.75%), (2) creative capabilities are culturally bound with no cross-lingual transfer (89.26pp gap between Chinese and English performance), and (3) lexical diversity inversely correlates with creative quality (TTR paradox), suggesting high diversity signals compensatory behavior for logical deficiencies. These findings establish that creative excellence emerges from the interaction between logical scaffolding and linguistic grounding, analogous to how mathematical reasoning enhances but cannot replace linguistic competence in foundation models.Dataset available at: https://anonymous.4open.science/r/COIG-Writer

Primary Area: datasets and benchmarks

Submission Number: 8577

Loading