Exploring Simple, High Quality Out-of-Distribution Detection with L2 Normalization

Jarrod Haas; William Yolland; Bernhard T Rabus

Exploring Simple, High Quality Out-of-Distribution Detection with L2 Normalization

Jarrod Haas, William Yolland, Bernhard T Rabus

Published: 19 Feb 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We demonstrate that L2 normalization over feature space can produce capable performance for Out-of-Distribution (OoD) detection for some models and datasets. Although it does not demonstrate outright state-of-the-art performance, this method is notable for its extreme simplicity: it requires only two addition lines of code, and does not need specialized loss functions, image augmentations, outlier exposure or extra parameter tuning. We also observe that training may be more efficient for some datasets and architectures. Notably, only 60 epochs with ResNet18 on CIFAR10 (or 100 epochs with ResNet50) can produce performance within two percentage points (AUROC) of several state-of-the-art methods for some near and far OoD datasets. We provide theoretical and empirical support for this method, and demonstrate viability across five architectures and three In-Distribution (ID) datasets.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: - Changed “background and methodology” to just “background” - Removed language positioning L2 normalizing as a general design consideration, to instead focus on specific potential for OoD detection - Moved motivation out of related work to top of a new methodology section - Simplified and clarified the motivations, excluded claims that lack evidence - Expanded related work to include Haas et al 2022, and clarify what that paper contributed so it is clear what our paper contributes in addition - Added formal definitions of NC measurements to section 3.2 (eqs 2,3,4) - “Competitive performance” changed to “strong performance” or “compares well” - Created a new methodology section - Rewrote sections 3.1, 3.2 (previously in background and methodology) - Clarified and simplified explanations to convey the main idea more strongly - Clarified that NC is not required for our method to work; our method works because we mitigate cross-entropy. Clarified that there could be exceptions to optimization behavior, despite several papers formalizing these dynamics - Equation 8: removed overline and added a fraction to more clearly indicate an average - Fixed typos - Changed language: “For our use case…feature not a bug” - Clarify definitions of feature-level information and class-level information in sec 3.1 - Accuracy vs norm size bar plot figure replaced with a scatter plot, to make the correlation clearer (the bar plot contained superfluous information about number of images per bin) - Heavily revised experiments sections 4.1 and 4.2 for clarity and better tie in with the methodology sections and the intuition behind our method - Moved section on measuring norm that arises from L2 from experiments to appendix A2 - Reviewer recommended Chatterjee et al. 2020, which we find fits nicely as a theoretical explanation of the correlation between feature updates and model confidence/accuracy, so we added it to the theoretical motivations in the new methodology section 3.3. - Added a succinct pytorch example to clarify exactly what the method is, make it easier to refer to ‘pre-normalized’ and ‘normalized’ feature vectors (Algorithm 1: L2 Normalization of Features) - Modified abstract to reflect language changes and scope update - Improved problem setup (section 2.1), and noted addition of new models - Clarified our claims about neural collapse and cross entropy in section 3.1. Noted exceptions where the theory may not result in a strong equinormality condition - Updated section 3.2 to indicate possible sources of noise affecting the linkage between norm size and weight updates - Updated training details in appendix - Re-arranged appendix, removed some content for clarity - Extended appendix table 4 with additional results: Compact Convolutional Transformers, ConvNeXt_tiny, full training runs for ResNet18/50 for the L2 case - Added new appendix tables (5 and 6, GTSRB and Tiny ImageNet as ID datasets) - Revised Table 1 to include ID accuracy scores when reported, heavily revised caption for clarity

Assigned Action Editor: ~Jaehoon_Lee2

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 1707

Loading