Abstract: Highlights•A Dual-Level Adaptive Incongruity-Enhanced Model(DAIE) is proposed.•By leveraging Patch-based Reconstructed Image(PRI), the token-level contrastive learning(TLCL) effectively diminishes the presence of common features among visually similar images.•The graph-level contrastive learning(GLCL) module with Negative pair Similarity Weights(NSW) dynamically adjusts the inter-node weights across the Graph Attention Networks(GAT).•Experimental results on a publicly available multimodal sarcasm detection dataset demonstrate the superiority of our proposed method.
Loading