Unitention: Attend a sample to the dataset

15 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Neural Network Backbone, Image Classification, Cross Attention
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose a plug-and-play trainable module that can improve any classification model by universal-individual cross-attention (Unitention).
Abstract: We propose an end-to-end trainable module termed Unitention, an abbreviation for universal-individual cross-attention, to improve deep features of a given neural network by attending the feature of a data sample to those of the entire dataset. This innovation is motivated by two key observations: (i) traditional visual encoding methods, such as Bag of visual Words, encode an image by using a universal dataset-wide codebook, while (ii) deep models typically process every individual data sample in isolation, without explicitly using any universal information. Our Unitention can bridge this gap by attentively merging universal and individual features, thus complementing and enhancing the given deep model. We evaluate its efficacy on various classification benchmarks and model architectures. On ImageNet, Unitention improves the accuracy of different ConvNets and Transformers. In particular, some \knn classifiers with Unitention can even outperform baseline classifiers. Improvements in fine-grained tasks are more substantial (up to 2.3%). Further validations on other modalities also confirm Unitention's versatility. In summary, Unitention reveals the potential of using dataset-level information to enhance deep features. It opens up a new backbone-independent direction for improving neural networks, orthogonal to the mainstream research on backbone architecture design.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 111
Loading