Keywords: Deep Learning, Computer Vision, Generative AI, Multimodal
Abstract: Despite remarkable advances in image generation, existing diffusion models struggle to capture diverse cultural aesthetics. While Low-Rank Adaptation (LoRA) enables efficient fine-tuning, conventional approaches lack semantic awareness and apply uniform adaptations across all features, leading to suboptimal cultural representation. To address these limitations, we introduce K-StyleLoRA, a novel framework that leverages CLIP's cross-modal understanding for culturally-aware image generation. Our approach consists of two key innovations. First, CLIP-Guided Information Gating dynamically modulates LoRA adaptations based on cultural relevance scores, enabling selective enhancement of culturally-relevant features while suppressing irrelevant ones. Second, Cultural Semantic Loss provides additional semantic guidance through CLIP-based similarity optimization with Korean cultural concepts. Extensive experiments on Korean traditional art demonstrate superior cultural fidelity while maintaining generation quality and diversity. Most notably, K-StyleLoRA demonstrates exceptional cultural transfer capability on generic prompts requiring implicit cultural understanding, achieving a Cultural Similarity Score of 0.274, representing a 9.6\% improvement over the vanilla SDXL baseline (0.250). Our framework establishes semantic-aware adaptation as a powerful paradigm for cultural representation, offering a scalable approach that can be extended to diverse cultural contexts and generation tasks beyond Korean aesthetics. Additional qualitative results and visual comparisons are available at our project page: REMOVED-FOR-REVIEW
Submission Number: 7
Loading