Advancing Semantic Edge Detection through Cross-Modal Knowledge Learning

Ruoxi Deng, Bin Yu, Jinxuan Lu, Caixia Zhou, Zhao-Min Chen, Jie Hu

Published: 01 Jan 2024, Last Modified: 19 Dec 2024ACM Multimedia 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Semantic edge detection (SED) is pivotal for the precise demarcation of object boundaries, yet it faces ongoing challenges due to the prevalence of low-quality labels in current methods. In this paper, we present a novel solution to bolster SED through the encoding of both language and image data. Distinct from antecedent language-driven techniques, which predominantly utilize static elements such as dataset labels, our method taps into the dynamic language content that details the objects in each image and their interrelations. By encoding this varied input, we generate integrated features that utilize semantic insights to refine the high-level image features and the ultimate mask representations. This advancement improves the quality of these features and elevates SED performance. Experimental evaluation on benchmark datasets, including SBD and Cityscape, showcases the efficacy of our method, achieving leading ODS F-scores of 79.0 and 76.0, respectively. Our approach signifies a notable advancement in SED technology by seamlessly integrating multimodal textual information, embracing both static and dynamic aspects.