Improving Native CNN Robustness with Filter Frequency Regularization

Published: 26 Dec 2023, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Neural networks tend to overfit the training distribution and perform poorly on out-of-distribution data. A conceptually simple solution lies in adversarial training, which introduces worst-case perturbations into the training data and thus improves model generalization to some extent. However, it is only one ingredient towards generally more robust models and requires knowledge about the potential attacks or inference time data corruptions during model training. This paper focuses on the native robustness of models that can learn robust behavior directly from conventional training data without out-of-distribution examples. To this end, we study the frequencies in learned convolution filters. Clean-trained models often prioritize high-frequency information, whereas adversarial training enforces models to shift the focus to low-frequency details during training. By mimicking this behavior through frequency regularization in learned convolution weights, we achieve improved native robustness to adversarial attacks, common corruptions, and other out-of-distribution tests. Additionally, this method leads to more favorable shifts in decision-making towards low-frequency information, such as shapes, which inherently aligns more closely with human vision.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: - We have included a further motivation and discussion on native robustness in Appendix A (as suggested by reviewer KFrx in W1/C1) - We have included a thorough discussion about feature maps and weight regularization in Sec. 2 (as suggested by KFrx W2/C2) - We have included the theoretical motivation of the proposed regularization term in Appendix P (as suggested by reviewers KFrx in C4 & and TkKe in W2) - We have enlarged the text in Fig. 4 (as suggested by KFrx in C5) - We have added attribution maps in Appendix N (as suggested by reviewer KFrx in C6) - We have added Fourier spectra of activations in Appendix O (as suggested by reviewers b5AR in W1 & TkKe in W1.1) - We have added the missing citation (Huang et al., (2023)) in Sec. 2 (as suggested by b5AR in W2) - We have added an additional anti-aliasing comparison in Tab. 2 (as suggested by b5AR in W3/C2) - We have added computation time comparison in Tab. 3 (as suggested by b5AR in W4) - We have removed abbreviations in hyperrefs (as suggested by reviewer b5AR in C4) - We have added hyperrefs to appendix sections (as suggested in C5 by Reviewer b5AR) - We have added a broader impact section in Appendix A (as suggested by b5AR "Broader impact statement") - We have added and analyzed Fourier spectra of activations in Appendix O (as suggested by reviewer TkKe in W1, to motivate the regularization of the weights in deeper layers) - We have changed Fig. 5 to show a 3x3 kernel (as suggested by TkKe W3.1) - We have written Eq. 8 in proper notation and improved the legibility (as suggested by reviewer TkKe in W5) - We have updated indexing in Eq. 3. Changes are highlighted in blue in the updated PDFs.
Code: https://github.com/jovitalukasik/filter_freq_reg
Supplementary Material: zip
Assigned Action Editor: ~Evan_G_Shelhamer1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 1498
Loading