Keywords: Outlier Efficiency, Outlier Robustness, Transformer-Based Model, Large Language Model, Foundation Model, Modern Hopfield Model, Attention Mechanism
Abstract: We introduce a principled approach to Outlier-Efficient Attention Layers via associative memory models to reduce outlier emergence in large transformer-based model. Our main contribution is a novel associative memory model that facilitates outlier-efficient associative memory retrievals. This model subsumes the outlier-efficient attention mechanism (`Softmax_1`) as a special case of its memory retrieval process. Methodologically, this enables the introduction of novel outlier-efficient Hopfield layers as powerful alternatives to traditional attention mechanisms, offering superior post-quantization performance. Empirically, we demonstrate the efficacy of the proposed model across large-scale transformer-based and Hopfield-based models, including BERT, OPT, ViT, and STanHop-Net, benchmarking against state-of-the-art methods like `Clipped_Softmax` and `Gated_Attention`. Notably, our method achieves an average reduction of over 22\% in average kurtosis and over 26\% in the maximum infinity norm of model outputs across the four models, without sacrificing model performance after quantization.
Submission Number: 61
Loading