Mix-tower: Light visual question answering framework based on exclusive self-attention mechanism

Published: 01 Jan 2024, Last Modified: 21 May 2025Neurocomputing 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•The Mix-Tower model combines the strengths of both single-tower and dual-tower models.•The same model architecture is used to process different types of data.•A lightweight FFN module has been incorporated into the Transformer.•Compared to other baseline models, our model achieves superior performance.
Loading