\textbf{TEAL}: \textbf{T}okenize and \textbf{E}mbed \textbf{ALL} for Multi-modal Large Language ModelsDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Despite Multi-modal Large Language Models (MM-LLMs) have made exciting strides recently, they are still struggling to efficiently model the interactions among multi-modal inputs and the generation in non-textual modalities. In this work, we propose \textit{TEAL (Tokenize and Embed ALL)}, an approach to treat the input from any modality as a token sequence and learn a joint embedding space for all modalities. Specifically, for the input from any modality, \textit{TEAL} firstly discretizes it into a token sequence with the off-the-shelf tokenizer and embeds the token sequence into a joint embedding space with a learnable embedding matrix. MM-LLMs just need to predict the multi-modal tokens autoregressively as conventional textual LLMs do. Finally, the corresponding de-tokenizer is applied to generate the output in each modality based on the predicted token sequence. With the joint embedding space, \textit{TEAL} enables the frozen LLMs to perform both understanding and generation tasks involving non-textual modalities, such as image and audio. Thus, the textual LLM can just work as an interface and maintain its high performance in textual understanding and generation. Experiments show that \textit{TEAL} achieves substantial improvements in multi-modal understanding, and implements a simple scheme for multi-modal generation.
Paper Type: long
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Contribution Types: Model analysis & interpretability
Languages Studied: English
Preprint Status: There is a non-anonymous preprint (URL specified in the next question).
A1: yes
A2: yes
A3: yes
B: yes
B1: yes
B2: yes
B3: yes
B4: yes
B5: no
B6: yes
C: yes
C1: yes
C2: yes
C3: yes
C4: yes
D: no
D1: yes
D2: yes
D3: yes
D4: yes
D5: yes
E: no
E1: no
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview