$S^{2}$-FracMix: Self-Saliency Fractal Mixup

Khawar Islam; Arif Mahmood; Xin Jin; NAVEED AKHTAR

$S^{2}$-FracMix: Self-Saliency Fractal Mixup

Khawar Islam, Arif Mahmood, Xin Jin, NAVEED AKHTAR

19 Sept 2025 (modified: 16 Feb 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: data augmentation, mixup, image classification, transfer learning, object detection

Abstract: Data augmentation methods have shown impressive performance in learning training data distributions to minimize the generalization gap. Recently, these approaches have been replaced by adversarial mixup methods to produce online mixed samples to improve robustness and generalization of deep neural networks. In addition, previous saliency-based methods simply extract the salient region from the source image and paste it into target image. Although these approaches improve performance, they may introduce unreliable samples during training in addition to substantial computational overhead. In this paper, we introduce a Self-Saliency ($S^2$) mixup method that creates challenging samples by extracting only salient patches at varying scales and places back into the non-salient regions of the same image. The aim is to learn scale-invariant features to improve generalization with less computational overhead. Also, to improve resilience against adversarial perturbations, we propose a new approach \textit{FracMix} which only mixes self-similarity pattern into salient patches with different mixing ratios. Our proposed $S^{2}$-FracMix enables the model to learn from both fractal and non-fractal structures simultaneously within a single training image, offering a more targeted and label-consistent form of augmentation. The proposed $S^{2}$-FracMix demonstrates state-of-the-art performance on seven datasets including coarse and fine-grained classification, robustness against corruption, calibration, contrastive learning, object detection, data scarcity ($5$, $10$, and $100$ shots), and transfer learning compared to the existing state-of-the-art methods.

Supplementary Material: pdf

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 14884

Loading