Keywords: music, reinforcement learning, airl, deep learning
Abstract: Most recent approaches to automatic music harmony composition adopt deep supervised learning to train a model using a set of human composed songs as training data. However, these approaches suffer from inherent limitations from the chosen deep learning models which may lead to unpleasing harmonies. This paper explores an alternative approach to harmony composition using a combination of novel Deep Supervised Learning, DeepReinforcement Learning and Inverse Reinforcement Learning techniques. In this novel approach, our model selects the next chord in the composition(action) based on the previous notes(states), therefore allowing us to model harmony composition as a reinforcement learning problem in which we look to maximize an overall accumulated reward. However, designing an appropriate reward function is known to be a very tricky and difficult process. To overcome this problem we propose learning a reward function from a set of human-composed tracks using Adversarial Inverse Reinforcement Learning. We start by training a Bi-axial LSTM model using supervised learning and improve upon it by tuning it using Deep Q-learning. Instead of using GANs to generate a similar music composition to human compositions directly, we adopt GANs to learn the reward function of the music trajectories from human compositions. We then combine the learned reward function with a reward based on music theory rules to improve the generation of the model trained by supervised learning. The results show improvement over a pre-trained model without training with reinforcement learning with respect to a set of objective metrics and preference from subjective user evaluation.
One-sentence Summary: Polyphonic music composition using adversial inverse reinforcement learning
4 Replies
Loading