Abstract: While recent work has shown that it is possible to find disentangled directions in the latent space of image generative networks, finding directions in the latent space of sequential models for music generation remains a largely unexplored topic. In this work, we propose a method for discovering linear directions in the latent space of a musicgenerating Variational Auto-Encoder (VAE). We use PCA, a statistical method, to transform the input data such that the variation along the new axes is maximized. We apply PCA to the latent space activations of our model and find largely disentangled directions that change the style and characteristics of the input music. Our experiments show that the found directions are often monotonic, global and encode fundamental musical characteristics such as colorfulness, speed, and repetitiveness. Moreover, we propose a set of quantitative metrics to describe different musical styles and characteristics to evaluate our results. We show that the found directions decouple content and can be utilized for style transfer and conditional music generation tasks. Our project page can be found at http://catlab-team.github.io/midispace.
0 Replies
Loading