Dr Genre: Reinforcement Learning from Decoupled LLM Feedback for Generic Text Rewriting

Dr Genre: Reinforcement Learning from Decoupled LLM Feedback for Generic Text Rewriting

ACL ARR 2025 May Submission365 Authors

11 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Generic text rewriting is a prevalent large language model (LLM) application that covers diverse real-world tasks, such as style transfer, fact correction, and email editing. These tasks vary in rewriting objectives (e.g., factual consistency vs. semantic preservation), making it challenging to develop a unified model that excels across all dimensions. Existing methods often specialize in either a single task or a specific objective, limiting their generalizability. In this work, we introduce a generic model proficient in factuality, stylistic, and conversational rewriting tasks. To simulate real-world user rewrite requests, we construct a conversational rewrite dataset, ChatRewrite, that presents ``natural''-sounding instructions, from raw emails using LLMs. Combined with other popular rewrite datasets, including LongFact for the factuality rewrite task and RewriteLM for the stylistic rewrite task, this forms a broad benchmark for training and evaluating generic rewrite models. To align with task-specific objectives, we propose Dr Genre, a Decoupled-reward learning framework for Generic rewriting, that utilizes objective-oriented reward models with a task-specific weighting. Evaluation shows that Dr Genre delivers higher-quality rewrites across all targeted tasks, improving objectives including instruction following (agreement), internal consistency (coherence), and minimal unnecessary edits (conciseness).

Paper Type: Long

Research Area: Generation

Research Area Keywords: automatic evaluation, few-shot generation, text-to-text generation

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Data resources

Languages Studied: English

Keywords: automatic evaluation, few-shot generation, text-to-text generation

Submission Number: 365

Loading