Improving Multimodal Large Language Models in Low-Resource Language Contexts

Yufei Gao; Feijiaying; Guohang Yan; Yunshi Lan

Improving Multimodal Large Language Models in Low-Resource Language Contexts

Yufei Gao, Feijiaying, Guohang Yan, Yunshi Lan

Published: 06 Mar 2025, Last Modified: 30 Apr 2025ICLR 2025 Workshop Data Problems PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Low-resource languages, Multimodal large language model, Knowledge distillation, Data collection, Two-stage training method, Language adaptation stage, Knowledge enhancement stage

TL;DR: This paper addresses the challenge of low-resource languages by collecting multimodal data and proposing a two-stage training approach to enhance large language models' performance in these contexts.

Abstract: In recent years, open-source Multimodal Large Language Models (MLLM) have developed rapidly, but their strengths remain primarily in mainstream languages such as English and Chinese. Due to the relative scarcity of data for non-mainstream languages, these models perform poorly in low-resource languages, struggling not only to understand and generate them fluently but also to grasp the knowledge familiar to their speakers. Recognizing the importance of low-resource language data, this paper collects multimodal data containing small-language knowledge from relevant websites. Moreover, we propose a two-stage training approach to improving multimodal large language models in low-resource language contexts. In the first stage, multimodal capabilities are transferred to low-resource languages, while the second stage further supplements the model with the knowledge in the collected dataset. Experimental results demonstrate that this data collection strategy and training method effectively extend MLLM's multimodal capabilities to low-resource languages and enable multimodal large models to perform better in such contexts.

Submission Number: 83

Loading