Abstract: Ancient murals carry invaluable historical information that reflects the evolution of civilizations, making them an irreplaceable com-ponent of cultural heritage. However, due to prolonged natural weathering and human-induced damage, these murals often exhibit characteristic deterioration such as peeling, scratches, and fading of mineral pigments. The structural complexity and contextual consistency of damaged areas presents significant challenges for large-scale mural image inpainting. Recent progress in text-to-image diffusion models has shown remarkable performance in generating high-quality natural images, suggesting that such models could potentially alleviate these challenges. Nevertheless, they often struggle to efficiently use specialized textual descriptions in guiding mural inpainting. To address these challenges, we propose a framework called K2Mural based on a pre-trained multi-modal diffusion model. Specifically, we align each mural image with a textual description to facilitate Low-Rank Adaptation (LoRA) based fine-tuning. Then K2Mural uses our collected mural image-text pairs to capture key structural patterns and inpainting damaged regions by leveraging the cross-modal synthesis capability of large-scale model. Extensive experiments on both synthetic mural datasets and real-world damaged murals verify that our method maintains structural integrity and exhibits high stylistic consistency when inpainting mural images. We provide our code in https://github.com/lgdEdric/K2Mural as supplementary materials for review and will publish it upon acceptance.
External IDs:dblp:conf/icic/LiPY25
Loading