Photong: Generating 16-Bar Melodies from ImagesDownload PDF

21 Nov 2022 (modified: 05 May 2023)creativeAIReaders: Everyone
Keywords: machine learning, computer vision, music generation, sound and music computing
TL;DR: A VAE-based pipeline that generates cohesive 16-bar MIDI melodies from images through emotion detection and modality transfer using feature embeddings.
Abstract: This work aims to study the possibility of melody generation based on any arbitrary image using the power of deep-learning neural networks. We suggest a VAE-based pipeline that generates cohesive 16-bar MIDI melodies from images through emotion detection and modality transfer using feature embeddings. To implement this pipeline, we used an image encoder, a MIDI VAE and three bridging computer vision models. We then evaluate the system by examining the musical features of four distinct outputs to see how well they have captured the features of the input images.
Submission Type: archival
Presentation Type: online
Presenter: Yanjia Zhang
0 Replies

Loading