Abstract: Face photo-sketch synthesis has undergone remarkable progress with the rapid development of deep learning techniques. Cutting-edge methods directly learn the cross-domain mapping between photos and sketches, which ignores the available reference samples. We argue that the reference samples can provide adequate prior information on texture and content in this task and improve the visual performance of synthetic images. This paper proposes a Dual Conditional Normalization Pyramid (DCNP) network with a multi-scale pyramid structure. The core of the DCNP network is a Dual Conditional Normalization (DCN) based architecture, which can obtain prior information on different semantics from reference samples. Specifically, DCN contains two conditional normalization branches. The first branch allows for spatially-adaptive normalization of the reference image conditioned on the semantic mask of the input image. The second branch enables adaptive instance normalization of the input image conditioned on the reference image. DCN can emphasize the isolated importance of textural and spatial factors by disintegrating the entire cross-domain mapping into two branches. To avoid information redundancy and improve the final performance, we propose a Gated Channel Attention Fusion (GCAF) module to distill and fuse the helpful information of the two branches. Qualitative and quantitative experimental results demonstrate the superior performance of the proposed method over the state-of-the-art approaches in structural information preservation and realistic texture generation. The code is public in <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/Tony0720/DCNP</uri> .
0 Replies
Loading