Adaptive knowledge distillation and integration for weakly supervised referring expression comprehension
Abstract: Weakly supervised referring expression comprehension (REC) aims to ground target objects in images according
to given referring expressions, while the mappings between image regions and referring expressions are
unavailable during the model training phase. Existing models typically reconstruct the multimodal relationships
to ground targets by utilizing off-the-shelf information, and ignore to further exploit helpful knowledge
to enhance the model performance. To address this issue, we propose an adaptive knowledge distillation
architecture to enrich the predominant pattern of weakly supervised REC and transfer the target-aware and
interaction-aware knowledge from a pre-trained teacher grounder to enhance the grounding performance of
the student model. Specifically, in order to encourage the teacher to impart more reliable knowledge, we
present a Knowledge Confidence-Based Adaptive Temperature (KCAT) learning approach to learn optimal
temperatures to transfer the target-aware and interaction-aware knowledge with higher prediction confidence.
Moreover, to urge the student to absorb more helpful knowledge, we introduce a Student Competency-Based
Adaptive Weight (SCAW) learning strategy to dynamically integrate the distilled target-aware and interaction-
aware knowledge to enhance the student’s grounding certainty. We conduct extensive experiments on three
benchmark datasets, RefCOCO, RefCOCO+, and RefCOCOg, to validate the proposed approach. Experimental
results demonstrate that our approach achieves superior performance over state-of-the-art methods with
the aid of adaptive knowledge distillation and integration. The code and trained models are available at:
https://github.com/dami23/WREC_AdaptiveKD.
Loading