Abstract: We present a novel approach for automatic multi-track music generation utilizing a Generative Adversarial Network (GAN) framework integrated with Transformer architecture. While previous research has successfully produced extended multitrack compositions, they often exhibit an unnatural quality. This is caused by unusual instrument performance, unconventional composition, and inconsistencies in scale and groove, all contributing to the feeling of absence of the human feeling in generated music. In our method, we incorporate a Transformer-based discriminator specifically designed to assess the human-like qualities of the generated music tracks. Experimental results have demonstrated the effectiveness of our approach.
Loading