Leveraging Human Preferences to Master Poetry

Rafael Pardinas; Gabriel Huang; David Vazquez; Alexandre Piché

Leveraging Human Preferences to Master Poetry

Rafael Pardinas, Gabriel Huang, David Vazquez, Alexandre Piché

21 Nov 2022 (modified: 05 May 2023)creativeAIReaders: Everyone

Keywords: Reinforcement Learning from Human Preferences, Creative AI, Poetry, Haiku

TL;DR: Use RL from human feedback to write better Haiku

Abstract: Large language models have been fine-tuned to learn poetry by supervised learning on a dataset containing relevant examples. However, those models do not generate good-quality output that respects the structure expected for a specific poem type. For instance, generated haikus may contain toxic language, be off-topic, incoherent, and not respect the typical 5-7-5 syllable meter. In this work, we investigate if it is possible to learn an objective function to quantify the quality of haiku—from human feedback—and if this reward function can be used to improve haiku generation using reinforcement learning.

Submission Type: archival

Presentation Type: online

Presenter: Rafael Pardinas

0 Replies

Loading