GROD: Enhancing Generalization of Transformer with Out-of-Distribution Detection

Yijin Zhou; Yu Guang Wang

GROD: Enhancing Generalization of Transformer with Out-of-Distribution Detection

Yijin Zhou, Yu Guang Wang

Published: 03 Jul 2024, Last Modified: 14 Jul 2024ICML 2024 FM-Wild Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Out-of-Distribution Detection, Transformer networks

TL;DR: We propose an OOD detection framework GROD motivated by the proposed OOD detection learning theory on Transformers, which shows interpretability and university across various tasks.

Abstract: Transformer networks face challenges in generalizing to Out-of-Distribution (OOD) datasets, that is, data whose distribution differs from that seen during training. Utilizing an OOD detection framework based on Probably Approximately Correct (PAC) theory, the proposed \textit{Generate Rounded OOD Data} (GROD) algorithm, a novel approach to enhancing transformer networks' generalization across various natural language processing and computer vision datasets, improves transformers' ability to in-distribution (ID) data boundary decision-making and detect outliers effectively. By incorporating synthetic outlier generation and penalizing OOD misclassification within the loss function, GROD refines model parameters and ensures robust performance. Empirical evaluations show that GROD achieves state-of-the-art (SOTA) results in natural language processing (NLP) and computer vision (CV) tasks, significantly reducing the SOTA FPR@95 from 21.97% to 0.12%, and improving AUROC from 93.62% to 99.98% on image classification tasks, and the SOTA FPR@95 by 12.89% and AUROC by 2.27% in detecting semantic text outliers. The code is available at https://anonymous.4open.science/r/GROD-OOD-Detection-with-transformers-B70F.

Submission Number: 15

Loading