Mix-tower: Light visual question answering framework based on exclusive self-attention mechanism

Deguang Chen, Jianrui Chen, Luheng Yang, Fanhua Shang

Published: 2024, Last Modified: 31 Jan 2026Neurocomputing 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Highlights•The Mix-Tower model combines the strengths of both single-tower and dual-tower models.•The same model architecture is used to process different types of data.•A lightweight FFN module has been incorporated into the Transformer.•Compared to other baseline models, our model achieves superior performance.