everyone
since 04 Oct 2024">EveryoneRevisionsBibTeXCC BY 4.0
Adversarial contrastive learning aims to learn a representation space robust to adversarial inputs using only unlabeled data. Existing methods typically generate adversarial perturbations by maximizing the contrastive loss during adversarial training. However, we find that the effectiveness of this approach is influenced by the composition of positive and negative examples in a minibatch, which is not explicitly controllable. To address this limitation, we propose a novel approach to adversarial contrastive learning, where adversarial perturbations are generated based on the clustering structure of the representation space learned through contrastive learning. Our method is motivated by the observation that contrastive learning produces a well-separated representation space, where similar data points cluster together in space, while dissimilar ones are positioned farther apart. We hypothesize that perturbations directed toward neighboring (the second nearest to be specific) clusters are likely to cross the decision boundary of a downstream classifier built upon contrastive learning, effectively acting as adversarial examples. A key challenge in our approach is to determine a sufficiently large number of clusters, for which the number of classes in the downstream task would serve the purpose but is typically unknown during adversarial contrastive learning. Therefore, we employ the silhouette score to identify the optimal number of clusters, ensuring high-quality clustering in the representation space. Compared to the existing approaches, our method achieved up to $2.25$% and $5.05$% improvements in robust accuracy against PGD and Auto-Attack, respectively, showing slight improvement in standard accuracy as well in most cases.