Abstract: Hateful memes have become a significant con- cern on the Internet, necessitating robust auto- mated detection systems. While Large Multi- modal Models (LMMs) have shown promise in hateful meme detection, they face notable challenges like sub-optimal performance and limited out-of-domain generalization capabil- ities. Recent studies further reveal the limita- tions of both supervised fine-tuning (SFT) and in-context learning when applied to LMMs in this setting. To address these issues, we pro- pose a robust adaptation framework for hateful meme detection that enhances in-domain ac- curacy and cross-domain generalization while preserving the general vision-language capabil- ities of LMMs. Analysis reveals that our ap- proach achieves improved robustness under ad- versarial attacks compared to SFT models. Ex- periments on six meme classification datasets show that our approach achieves state-of-the- art performance, outperforming larger agen- tic systems. Moreover, our method generates higher-quality rationales for explaining hate- ful content compared to standard SFT, enhanc- ing model interpretability. Code available at https://github.com/JingbiaoMei/RGCL
Loading