Enhancing Semantic Image Synthesis: A GAN-Based Approach with Multi-Feature Adaptive Denormalization Layer

Karim Magdy, Ghada Khoriba, Hala Abbas

Published: 01 Jan 2023, Last Modified: 02 Jul 2024MEDI 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Semantic image synthesis, a pivotal task in image-to-image translation, has been widely addressed using generative adversarial network (GAN) models. However, existing GAN-based approaches often suffer from inadequate incorporation of structural and spatial information, resulting in unsatisfactory quality of the synthesized images and a pronounced disparity between photo-realistic and generated images. In this paper, we propose a novel GAN-based methodology to address these limitations, enabling the generation of high-resolution images from semantic label maps while bridging the quality gap and preserving detailed information in the generated outputs. The proposed approach leverages a two-step process, starting with a local binary pattern convolutional generator that produces a local binary pattern feature map. Subsequently, a global convolutional generator is fed with the segmentation map and the feature map through a learned modulation scheme facilitated by a multi-feature adaptive denormalization layer (MFADE) during the training process to generate photo-realistic images. Extensive experiments using Cityscapes, ADE20K, and COCO-stuff datasets validate the performance of our proposed method and showcase its accuracy and robustness in addressing semantic image synthesis tasks, thereby paving the way for its potential applications in enhancing urban sensing and data analytics in Smart Cities. The source code is available at https://github.com/karimmagdy/ULBPGAN.