Mitigating Transformer Overconfidence via Lipschitz Regularization

Wenqian Ye; Yunsheng Ma; Xu Cao; Kun Tang

Mitigating Transformer Overconfidence via Lipschitz Regularization

Wenqian Ye, Yunsheng Ma, Xu Cao, Kun Tang

Published: 08 May 2023, Last Modified: 22 Jun 2025UAI 2023Readers: Everyone

Abstract: Though Transformers have achieved promising results in many computer vision tasks, they tend to be over-confident in predictions, as the standard Dot Product Self-Attention (DPSA) can barely preserve distance for the unbounded input domain. In this work, we fill this gap by proposing a novel Lipschitz Regularized Transformer (LRFormer). Specifically, we present a new similarity function with the distance within Banach Space to ensure the Lipschitzness and also regularize the term by a contractive Lipschitz Bound. The proposed method is analyzed with a theoretical guarantee, providing a rigorous basis for its effectiveness and reliability. Extensive experiments conducted on standard vision benchmarks demonstrate that our method outperforms the state-of-the-art single forward pass approaches in prediction, calibration, and uncertainty estimation.

Other Supplementary Material: zip

Supplementary Material: pdf

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/mitigating-transformer-overconfidence-via/code)

0 Replies

Loading