Abstract: In this paper, we modify the attention mechanism in Visual Transformer (ViT) models by replacing the standard self-attention with a generalization of the Sugeno integral defined by various aggregation functions, which have been extensively applied to solve various problems in the literature. We evaluated the performance of our models using image classification accuracy rates on multiple datasets, including CIFAR-10, CIFAR-100, Caltech-101, and COCO, supporting the generalization of our findings. The results indicate that the ViTs’ attention mechanism is significantly robust, presenting similar behaviors even when distinct aggregation functions are employed. Our study sheds light on the relationship between the attention mechanism and the aggregation functions, contributing to a better understanding of their role and functioning.
External IDs:dblp:conf/fuzzIEEE/SartoriSBDLMB25
Loading