Abstract: Self-supervised contrastive learning (CL) effectively learns
transferable representations from unlabeled data containing
images or image-text pairs but suffers vulnerability to data
poisoning backdoor attacks (DPCLs). An adversary can
inject poisoned images into pretraining datasets, causing
compromised CL encoders to exhibit targeted misbehavior
in downstream tasks. Existing DPCLs, however, achieve
limited efficacy due to their dependence on fragile implicit
co-occurrence between backdoor and target object and inadequate suppression of discriminative features in backdoored images. We propose Noisy Alignment (NA), a DPCL
method that explicitly suppresses noise components in poisoned images. Inspired by powerful training-controllable CL
attacks, we identify and extract the critical objective of noisy
alignment, adapting it effectively into data-poisoning scenarios. Our method implements noisy alignment by strategically manipulating contrastive learning’s random cropping mechanism, formulating this process as an image layout optimization problem with theoretically derived optimal
parameters. The resulting method is simple yet effective,
achieving state-of-the-art performance compared to existing DPCLs, while maintaining clean-data accuracy. Furthermore, Noisy Alignment demonstrates robustness against
common backdoor defenses. Codes can be found at https:
//github.com/jsrdcht/Noisy-Alignment.
Loading