Keywords: music generation, plagiarism, symbolic music, overfitting, responsible AI, ethics in AI
TL;DR: We demonstrate that plagiarism in symbolic music generation arises from overfitting short motifs and investigate mitigation strategies.
Abstract: This paper examines plagiarism-like behaviors in Transformer-based models for symbolic music generation. While these models can produce musically convincing outputs, they also risk copying fragments from training data. We hypothesize that such plagiarism arises from local overfitting of motifs, short recurrent patterns within a piece, rather than from global overfitting. To test this hypothesis, we analyze motif repetition in training data and assess motif-level plagiarism through perplexity and the originality of generated samples. Experiments show that frequently repeated motifs are predicted with lower perplexity and are more likely to reappear in generated outputs. We also explore preliminary strategies to mitigate plagiarism—label smoothing, transposition-based data augmentation, and Top-$K$ sampling—and evaluate their effectiveness.
Track: Paper Track
Confirmation: Paper Track: I confirm that I have followed the formatting guideline and anonymized my submission.
Submission Number: 67
Loading