SMIT: Style-based Multi-level Feature Fusion for Image-to-Image Translation

Published: 01 Jan 2024, Last Modified: 08 Apr 2025ICTC 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Image-to-Image translation (121) involves converting images from one domain to another by learning a mapping that preserves the essential content while altering the image's appearance. Despite advancements in deep learning, particularly with convolutional neural networks, 121 tasks remain challenging due to issues like artifacts and difficulty in maintaining content integrity during style transformations. Recent models like Cy-cleGAN, MUNIT, and CUT have made strides in addressing these challenges, but limitations persist, particularly regarding diversity, realism, and precise style control. To overcome these limitations, we propose SMIT, a novel 121 model that leverages the strengths of the StyleGAN architecture combined with the Swin Transformer. Our model uses Swin- T as the encoder to extract multi-resolution features, which are then integrated into the StyleSwin model through a multi-level fuser module. This approach allows for effective embedding of contextual information, ensuring high-quality image generation with accurate domain transformations. Our model demonstrates superior performance over existing 121 methods on public datasets, validating its effectiveness through both qualitative and quantitative assessments.
Loading