Abstract: Highlights•We design a CLIP-based text-guided style transfer method called TextStyler, which not only gets rid of the constraints of traditional style transfer for style images, but also ensures the generation of artistic images with undistorted content and balanced stylization during texture transfer.•We propose a novel style transformation network called STNet, which employs an encoder and a decoder with multi-scale structure to capture hierarchical visual features of content images.•We design patch-wise perceptual loss to measure the difference between the content image and the stylized image from a path-wise perspective.