PILL: Plug into LLM with adapter expert and attention gate

Published: 01 Jan 2024, Last Modified: 19 Feb 2025Appl. Soft Comput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•PILL bridges the gap between pre-trained LM and multimodal understanding.•MAG prevents visual information from interfering with the LLM’s text modeling.•MoMAE equips each modality with dedicated FFNs to address modal entanglement issue.•PILL exhibits superior efficiency and competitive performance.
Loading