Keywords: Inference and Prompt Engineering of Music LLMs, In-Context Learning of Music LLMs
TL;DR: This paper evaluates the extent to which expertise in prompt construction influences the quality of the music generation and proposed a RAG Rewrite system that transforms novice prompts into expert descriptions using CLAP and GPT 3.5.
Abstract: This paper evaluates the extent to which expertise in prompt construction influences the quality of the music generation output. We propose a Retrieval-Augmented Prompt Rewrite system (RAG) that transforms novice prompts into expert descriptions using CLAP. Our method helps preserve user intent and bypass the need for extensive domain training of the user. Given novice-level prompts, participants selected relevant terminologies from top-k most textually or audibly similar MusicCaps captions, which were fed into GPT 3.5 to create succinct, expert-level rewrites. These rewrites were then used to generate music with Stable Audio 2.0. To mitigate anchoring bias toward expert prompts, we implemented a counterbalanced design and conducted a subjective study to evaluate the effectiveness of RAG. We generated rewrites using a traditional LoRA fine-tuning method as our baseline. Participants evaluated the expertness, musicality, production quality and preference of music generated from novice and expert prompts. Both RAG and LoRA rewrites significantly improve music generation across all NLP and subjective metrics, with RAG outperforming LoRA overall. Finally, the subjective results largely align with Meta’s Audiobox Aesthetics metrics.
Submission Number: 1
Loading