Submission Type: 2-page Extended Abstract (Non-archival) / 2페이지 Extended Abstract (프로시딩 미수록)
Keywords: generated music evaluation, human-centered evaluation, familiarity
Abstract: Human evaluation of generated music is often modeled as a direct function of sample-level representations, yet it remains unclear how much of subjective judgment can actually be explained by the generated sample alone. To investigate how listeners evaluate generated music, we conducted a continuation-based listening test using 200 piano continuations generated from shared primers across ten classical composer styles, rated by 190 listeners on six rating dimensions and Familiarity. Sample-only predictions captured general rating trends at the aggregate level, but performed poorly for individual listeners and failed to generalize to unseen composers. Variance decomposition further revealed that the unexplained portion of ratings was largely attributable to individual listener differences. We therefore examined Familiarity as a key interpretable listener-side factor. While Familiarity showed a strong style-level baseline across composer contexts, its explanatory role for ratings was driven primarily by continuation-level familiarity, and it also modulated how relative musical change was evaluated. These results suggest that evaluation of generated music is layered: beyond signal-level properties, human judgments reflect structured listener-side responses that are not fully captured by coarse similarity or sample-only representations.
Email Sharing: We authorize sharing author emails with Program Chairs. / 저자 이메일 공유에 동의합니다.
Data Release: We understand and agree to the OpenReview metadata visibility policy. / OpenReview 메타데이터 공개 정책을 이해하고 동의합니다.
Submission Number: 24
Loading