Face-To-Music: Music Generation Based on Facial Emotions

Published: 2024, Last Modified: 07 Jan 2026ICCE-Taiwan 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We have developed an artificial intelligence system that extracts emotions from uploaded photos and generates music that best matches these emotions. According to the model proposed by Russell, human emotions can be mapped onto a two-dimensional plane, and we utilize this concept by linking the facial emotions detected in FER2013 with the music in metaMIDI. The process for creating music suitable for a specific facial emotion is as follows: The emotion information extracted from the photo is transformed into a valence-arousal two-dimensional vector. Then, we identify the music in the metaMIDI dataset that has the closest matching valence-arousal vector. This selected music is input into one of the five Music Transformers, each corresponding to the identified emotion. Consequently, the Music Transformer generates one minute of music based on the first five seconds of the input music, effectively creating music that closely resembles the input facial emotion.
Loading