Keywords: Low-resource languages, Multimodal large language model, Knowledge distillation, Data collection, Two-stage training method, Language adaptation stage, Knowledge enhancement stage
TL;DR: This paper addresses the challenge of low-resource languages by collecting multimodal data and proposing a two-stage training approach to enhance large language models' performance in these contexts.
Abstract: In recent years, open-source Multimodal Large Language Models (MLLM) have developed rapidly, but their strengths remain primarily in mainstream languages such as English and Chinese. Due to the relative scarcity of data for non-mainstream languages, these models perform poorly in low-resource languages, struggling not only to understand and generate them fluently but also to grasp the knowledge familiar to their speakers. Recognizing the importance of low-resource language data, this paper collects multimodal data containing small-language knowledge from relevant websites. Moreover, we propose a two-stage training approach to improving multimodal large language models in low-resource language contexts. In the first stage, multimodal capabilities are transferred to low-resource languages, while the second stage further supplements the model with the knowledge in the collected dataset. Experimental results demonstrate that this data collection strategy and training method effectively extend MLLM's multimodal capabilities to low-resource languages and enable multimodal large models to perform better in such contexts.
Submission Number: 83
Loading