Data-Driven Creativity: Amplifying Imagination in LLM Writing

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, RLHF
TL;DR: We focus on creative writing tasks and propose an expert-assisted data generation process that can efficiently obtain high-quality ordinal data with minimal human resources. We validat the effectiveness of our method through RLHF..
Abstract: During the alignment training of Large Language Models (LLMs), Reinforcement Learning from Human Feedback (RLHF) has proven to be effective in enhancing the model's alignment with human preferences. The RLHF approach requires human annotators to provide data representative of human preferences, aiding the model in advancing towards human-preferred outcomes. In this process, high-quality human preference data is both crucial and challenging to obtain. While many tasks, such as coding and mathematics, can now be more efficiently annotated through Artificial Intelligence Feedback (AIF), numerous tasks still necessitate human input to provide human preference signals. Particularly creative tasks are typical tasks that involving complex human preference. Here, we focus on creative writing tasks and investigate how to collaborate with annotators to acquire high-quality, superior data. We propose an expert-assisted data generation process, named Expert-Objective-Personal-Subjective (EOPS), that can efficiently obtain high-quality ordinal data with minimal human resources. We conduct experiments on three kinds of tasks, and experimental results validat the effectiveness of our method.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8790
Loading