Multimodal LLM Alignment: Challenges, Solutions, and Research Opportunities

Multimodal LLM Alignment: Challenges, Solutions, and Research Opportunities

ACL ARR 2025 February Submission268 Authors

05 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Multimodal Large Language Models (MLLMs) have demonstrated impressive potential in handling complex tasks involving visual, auditory, and textual data. However, critical issues related to truthfulness, safety, and alignment with human preferences remain insufficiently addressed. This gap has spurred the emergence of various alignment algorithms. Recent studies have shown that alignment algorithms are a powerful approach to resolving the aforementioned challenges. In this paper, we aim to provide a comprehensive and systematic review of MLLM alignment algorithms. Specifically, we address four critical questions: (1) What application scenarios do existing alignment algorithms cover? (2) How are alignment datasets constructed? (3) How are alignment algorithms evaluated? (4) What are the future directions for the development of alignment algorithms? This work seeks to help researchers organize current advancements in the field and inspire better alignment methods.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: Multimodal Large Language Model, MLLM Alignment, Survey, Alignment with Human Preference

Contribution Types: Surveys

Languages Studied: English, Chinese

Submission Number: 268

Loading