Keywords: music generation, Human-AI music co-creativity, Music information retrieval, Impacts on music education, Implications for future musicians, Challenges in commercializing AI music tools, Emerging opportunities of AI music, music retrieval systems, creative practice, artistically-inspired generative tasks, RAG, Human-centered MIR, multimodality, industry applications
TL;DR: This paper evaluates the extent to which expertise in prompt construction influences the quality of the music generation output.
Abstract: This paper evaluates the extent to which expertise in prompt construction influences the quality of the music generation output. We propose a Retrieval-Augmented Prompt Rewrite system (RAG) that transforms novice prompts into expert descriptions using CLAP. Our method helps preserve user intent and bypass the need for extensive domain training of the user. Given novice-level prompts, participants selected relevant terminologies from top-k most textually or audibly similar MusicCaps captions, which were fed into GPT 3.5 to create succinct, expert-level rewrites. These rewrites were then used to generate music with Stable Audio 2.0. To mitigate anchoring bias toward expert prompts, we implemented a counterbalanced design and conducted a subjective study to evaluate the effectiveness of RAG. We generated rewrites using a traditional LoRA fine-tuning method as our baseline. Participants evaluated the expertness, musicality, production quality and preference of music generated from novice and expert prompts. Both RAG and LoRA rewrites significantly improve music generation across all NLP and subjective metrics, with RAG outperforming LoRA overall. Finally, the subjective results largely align with Meta’s Audiobox Aesthetics metrics.
Track: Demo Track
Confirmation: Demo Track: I confirm that I have followed the formatting guideline and included all author names and affiliations.
(Optional) Supplementary Material: zip
(Optional) Short Video Recording File: mp4
Submission Number: 39
Loading