Abstract: Bitmaps are a popular graphics format, and modern raster-based computer vision generation models can create images of incredibly high quality. However, the generation of images in vector format has not been sufficiently studied. In this paper, we explore the task of generating music covers based on audio tracks and user emotions. This task is quite relevant for designers, as the vector graphics format is most preferable when working with bright and memorable illustrations. We build upon previous work on the CoverGAN model and propose several corrections and innovations. Firstly, we have implemented the generation of closed shapes, enhancing the structural integrity and visual appeal of the covers, replacing the previous ambiguous and shapeless curves. Secondly, we have replaced the previous unstable model for text placement with a more robust algorithmic solution, improving the accuracy and aesthetic quality of text integration. Lastly, we have developed a neural network to generate the color palette of the cover, allowing for more harmonious and visually appealing designs. Additionally, we have conducted a user survey and identified improvements in the proposed approach. Thus, our work is a direct continuation of previous research and is specifically aimed at enhancing the visual aspects of the prior model. Music cover images generation code and demo are available at https://github.com/IzhanVarsky/CoverGAN.
External IDs:doi:10.1007/978-3-032-07623-6_19
Loading