When Creative Machines Learn from Each Other

Haven Kim; Yusong Wu; Taylor Berg-Kirkpatrick; Julian McAuley

When Creative Machines Learn from Each Other

Haven Kim, Yusong Wu, Taylor Berg-Kirkpatrick, Julian McAuley

Published: 23 Sept 2025, Last Modified: 08 Nov 2025AI4MusicEveryoneRevisionsBibTeXCC BY 4.0

Keywords: computational creativity, model alignment

Abstract: We explore the potential for creative neural network models to learn from each other. To enable this, we train two policy models using the exactly same architectures, configurations, and data, but with different random seeds. Then, we obtain a judge model that automatically rates the performance. We take samples from each policy model, rate them, and optimize each model by maximizing the probability of the better sample and minimizing the probability of the worse sample, using a popular model alignment technique, Direct Preference Optimization (DPO), in an online manner. The results show that our approach effectively improves the performance of models in three distinct, open-ended creative tasks: symbolic music generation, lyric generation, and lyric translation. However, it shows minimal benefit for a closed-ended task, Georgian-to-English machine translation.

Track: Paper Track

Confirmation: Paper Track: I confirm that I have followed the formatting guideline and anonymized my submission.

Submission Number: 27

Loading