The Role of Aggregation Functions on Transformers and ViTs Self-Attention for Classification

Joelson Sartori, Rodrigo de Bem, Graçaliz Pereira Dimuro, Giancarlo Lucca

Published: 2023, Last Modified: 28 Feb 2026SIBGRAPI 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Aggregation functions are mathematical operations that combine or summarize a set of values into a single representative value. They play a crucial role in the attention mechanisms of Transformer neural networks. However, Transformers' default aggregation functions, based on matrix multiplication, may have limitations in certain classification scenarios. This function may struggle with the complexity of information present in the input data, resulting in lower accuracy and efficiency. Considering this issue, the present work aims to replace the traditional matrix multiplication operation used in the classical attention mechanism with alternative and more general aggregation functions. To validate the new aggregation methods on the attention mechanism, we conducted experiments on two datasets, the recently propose Google American Sign Language (ASL) Fingerspelling Recognition and the well-known CIFAR-10, performing time series and image classification, respectively. Results shed light on the role of aggregation functions for classification with Transformers, demonstrating promising outcomes and potential for further improvements.

External IDs:dblp:conf/sibgrapi/SartoriBDL23