Towards Discrete Object Representations in Vision Transformers with Tensor Products

Published: 01 Jan 2023, Last Modified: 10 Nov 2024CSAI 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this work, we explore the use of Tensor Product Representations (TPRs) in a Vision Transformer model to form image representations that can later be used for symbolic manipulation in a neurosymbolic model. We propose the Tensor Product Vision Transformer (TP-ViT), an enhancement of a Vision Transformer that incorporates TPRs, an object representation methodology that utilizes filler and role vectors to represent objects. TP-ViT is the first application of TPRs on visual input, and we report qualitative and quantitative results which show that the use of TPRs allows for the formation of more targeted and diverse object representations when compared to a standard Vision Transformer.
Loading