Abstract: Artificial Intelligence and generative models have revolutionized music creation, with many models leveraging textual or visual prompts for guidance. However, existing image-to-music models are limited to simple images, lacking the capability to generate music from complex digitized artworks. To address this gap, we introduce \(\mathcal {A} \textit{rt2}\mathcal {M} \textit{us}\), a novel model designed to create music from digitized artworks or text inputs. \(\mathcal {A} \textit{rt2}\mathcal {M} \textit{us}\) extends the AudioLDM 2 architecture, a text-to-audio model, and employs our newly curated datasets, created via ImageBind, which pair digitized artworks with music. Experimental results demonstrate that \(\mathcal {A} \textit{rt2}\mathcal {M} \textit{us}\) can generate music that resonates with the input stimuli. These findings suggest promising applications in multimedia art, interactive installations, and AI-driven creative tools. The code is publicly available at: https://github.com/justivanr/art2mus_.
Loading