Abstract: Due to the inherent vulnerability of neural networks, adversarial attacks present formidable challenges to the robustness and reliability of deep learning models. In contrast to traditional adversarial training (AT) methods that prioritize semantic distillation and purification, our work pioneers a novel discovery attributing the insufficient adversarial robustness of models to the challenges of spatial attention shift and channel activation disarray. To mitigate these issues, we propose a robust spatial-aligned and channel-adapted learning paradigm, which we term the StayFocused, that integrates spatial alignment and channel adaptation to enhance the focus region against adversarial attacks by adaptively recalibrating the spatial attention and channel responses. Specifically, the proposed StayFocused mainly benefits from two flexible mechanisms, i.e., Spatial-aligned Hypersphere Constraint (SHC) and Channel-adapted Prompting Calibration (CPC). Specifically, SHC aims to enhance intra-class compactness and inter-class separation between adversarial and natural samples by measuring the angular margins and distribution distance within the hypersphere space. Inspired by the top-$K$ candidate prompts from the clean sample, CPC is designed to dynamically recalibrate channel-wise feature responses by explicitly modeling interdependencies between channels. To comprehensively learn feature representations, the StayFocused framework can be easily extended with additional branches in a multi-head training manner, further enhancing the model's robustness and adaptability. Extensive experiments on multiple benchmark datasets consistently demonstrate the effectiveness and superiority of our StayFocused over state-of-the-art baselines.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Content] Media Interpretation
Relevance To Conference: The proposed StayFocused method significantly contributes to multimedia/multimodal processing by enhancing the robustness and generalization of deep learning models in handling adversarial perturbations. In multimedia applications such as biomedical imaging, face recognition, and autonomous vehicles, where accurate and reliable processing of diverse data types is crucial, the presence of adversarial attacks poses significant security risks. By addressing challenges such as visual focus shift and channel activation disarray induced by adversarial perturbations, StayFocused ensures that deep learning models maintain focus on relevant features and effectively capture representations across different modalities. This contributes to improved performance and reliability in multimedia processing tasks, enabling more accurate and trustworthy analysis of complex data streams. Additionally, the versatility of StayFocused allows it to be seamlessly integrated into existing multimedia processing pipelines, further enhancing the overall robustness and effectiveness of these systems.
Supplementary Material: zip
Submission Number: 5526
Loading