Abstract: Recently, style transfer networks have received widespread attention in computer vision field, which combine stylistic features from a style image and content information from a content image to generate an output. However, the high-resolution synthesized outputs come at the cost of intensive computation, making it difficult to employ style transfer networks on embedded devices with limited computational resources. To address this issue, we propose a compression algorithm for one of the influential CNN-based style transfer networks, which is named Image Transform Net (ITNet), and gain a Light-weight Image Transform Net (LITNet) accordingly. To improve the performance of ITNet, normalization layers and the structure of upsampling blocks are modified, and depthwise separable convolutions combined with width multiplier are employed to obtain a brand-new light-weight network. However, since the representation ability of light-weight networks is too weak for unsupervised learning tasks such as style transfer, directly using the above techniques to compress the model leads to unstable training processes and yields poor outputs. To solve this problem, a novel distillation loss is proposed to convert unsupervised learning into supervised learning. Besides, the weights between the original losses and the distillation loss are balanced for better visual results. Experimental results demonstrate the effectiveness of our LITNet. With minimal visual quality degradation, the light-weight network can achieve more than 67 × compression in model size and 63× reduction in FLOPs. Codes and pre-trained models are available at https://github.com/shihuihong214/LITNet.
0 Replies
Loading