Robust Image Hashing Based on Contrastive Masked Autoencoder with Weak-Strong Augmentation Alignment
Abstract: Recently, numerous robust image hashing schemes have been developed for content identification. However, many of these schemes face the challenges of maintaining discrimination while simultaneously resisting large-scale attacks. In this paper, we propose a robust image hashing scheme based on Contrastive Masked Autoencoder with weak-strong augmentation Alignment (CMAA). Leveraging contrastive learning, CMAA is designed to learn features that are robust to large-scale and hybrid attacks while maintaining the discrimination of those features. Specifically, it utilizes distribution divergence to align weak attack augmented features with strong attack augmented features, namely weak-strong augmentation alignment, to enhance the robustness to strong attacks. In addition, a masked vision transformer is incorporated to further enhance content identification performance. CMAA also includes a parameter-free quantization layer to mitigate the loss induced by binarization. Experimental results demonstrate that our method exhibits remarkable robustness against various attacks, including challenging ones such as rotation and hybrid attacks, and delivers excellent identification performance with a F1 score close to 1.0. Our code and supplementary materials are available on Github.
Loading