Abstract: Currently, traditional image style transfer methods require the provision of a style image during training. If users do not have a corresponding style image or want to transfer an abstract or imagined style, these methods are not effective. With the introduction of the CLIP model, some text-based style transfer methods have been proposed, but these methods rely on pretrained generative models and do not effectively utilize feature information from different channels. To address these issues, we propose an attention-based text style transfer model called AttenStyler. We use a ”bigger-first, same-later” strategy to construct a biased channel attention module, which encourages the model to better focus on different channel features during training, thereby enhancing the model’s expressive capabilities. Additionally, we find that traditional bilinear interpolation upsampling struggles to learn content and style information for different images. Therefore, we use transpose convolutions instead and employ convolutions between skip connections to enhance the model’s information fusion capabilities. The paper includes qualitative analysis, quantitative analysis, and user studies to validate the effectiveness of the model. Through extensive experiments, it is demonstrated that this model can generate higher-quality style images and outperforms several representative methods mentioned in the paper.
Loading