Can We Debiase Multimodal Large Language Models via Model Editing?

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Multimodal large language models (MLLM) have been observed to exhibit biases originating from their training datasets. Unlike unimodal LLMs, biases in MLLMs may stem from interactions between multiple modalities, which increases the complexity of multimodal debiasing. Conventional approaches like fine-tuning to alleviate biases in models are costly and data-hungry. Model editing methods, which focus on post-hoc modifications of model knowledge, have recently demonstrated significant potential across diverse applications. These methods can effectively and precisely adjust the behavior of models in specific knowledge domains, while minimizing the impact on the overall performance of the model. However, there is currently no comprehensive study to drive the application of model editing methods in debiasing MLLM and to analyze its pros and cons. To facilitate research in this field, we define the debiasing problem of MLLM as an editing problem and propose a novel set of evaluation metrics for MLLM debias editing. Through various experiments, we demonstrate that: (1) Existing model editing methods can effectively alleviate biases in MLLM and can generalize well to semantically equivalent image-text pairs. However, most methods tend to adversely affect the stability of the MLLM. (2) Compared to editing the visual modality of the MLLM, editing the textual modality yields better results in addressing MLLM biases. (3) Model editing based debiasing method can achieve generalization across different types of biases.
Primary Subject Area: [Generation] Social Aspects of Generative AI
Secondary Subject Area: [Content] Vision and Language, [Generation] Multimedia Foundation Models
Relevance To Conference: Training multimodal models on large corpora inevitably leads to the emergence of stereotypes and biases, which can have detrimental effects on society. Mitigating these biases through fine-tuning may be both costly and require substantial amounts of data. Additionally, fine-tuning can lead to catastrophic forgetting and overfitting. Model editing approaches, which focus on post-hoc modifications of models, hold immense potential for addressing biases. In recent years, there has been significant development in model editing techniques aimed at swiftly and accurately modifying large language models, enabling them to generate more accurate and relevant outputs. This paper takes the first step to introduce a series of multimodal model editing methods specifically designed for debiasing multimodal models. Through the construction of a novel dataset, we conduct a comprehensive analysis of the generalizability, reliability, and locality of various model editing methods in debiasing tasks. Furthermore, we investigate the contributions of different components within Multimodal LLMs to multimodal debiasing. Finally, we explore the potential of extending the use of model editing methods from editing biases in one domain to addressing biases in other domains.
Supplementary Material: zip
Submission Number: 4891
Loading