Reconstructing, Understanding, and Analyzing Relief Type Cultural Heritage from a Single Old Photo

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Relief-type cultural heritage objects are commonly found at historical sites but often manifest with varying degrees of damage and deterioration. The traditional process of reconstructing these reliefs is laborious and requires extensive manual intervention and specialized archaeological knowledge. By utilizing a single old photo containing predamage information of a given relief, monocular depth estimation can be used to reconstruct 3D digital models. However, extracting depth variations along the edges is challenging in relief scenario due to the highly compression of the depth values, resulting in low-curvature edges. This paper proposes an innovative solution that leverages a multi-task neural network to enhance the depth estimation task by integrating the edge detection and semantic segmentation tasks. We redefine edge detection of relief data as a multi-class classification task rather than a typical binary classification task. In this paper, an edge matching module that performs this novel task is proposed to refine depth estimations specifically for edge regions. The proposed approach achieves better depth estimation results with finer details along the edge region. Additionally, the semantic and edge outputs provide a comprehensive reference for multi-modal understanding and analysis. This paper not only advances in computer vision task computer vision tasks but also provides effective technical support for the protection of relief-type cultural heritage objects.
Primary Subject Area: [Experience] Art and Culture
Secondary Subject Area: [Experience] Multimedia Applications, [Content] Multimodal Fusion, [Content] Media Interpretation
Relevance To Conference: This work fully meets the requirements of the primary subject area “ART and Culture” and is also related to the theme of “Multimedia Content Understanding”. We contribute to the field of multimedia/multimodal processing through the proposed multi-task learning-based method applied to the scene of relief-type cultural heritage. The proposed method performs tasks of monocular depth estimation, semantic segmentation, and a newly defined soft edge detection task within a single network. The ability to generate multi-modal feature maps from a single photo marks an advancement in the multimodal processing domain. It also provides effective technique support for the multi-modal understanding and analysis of relief-type cultural heritage. Moreover, with the depth feature map, the proposed method can reconstruct damaged relief-type cultural heritage from a single old photo documenting its pre-damaged information. We summarize the limitations of the related works on digitally reconstructing relief-type cultural heritage into 3D models. The proposed optimization methods effectively address these limitations according to our experimental results. Therefore, we demonstrate a novel application of multimedia/multimodal processing techniques in both computer vision and cultural heritage preservation in this work.
Supplementary Material: zip
Submission Number: 5044
Loading