MindLLM: Lightweight large language model pre-training, evaluation and domain application

Published: 01 Jan 2024, Last Modified: 13 May 2025AI Open 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•This study presents MindLLM, a novel bilingual lightweight large language model trained from scratch. Diverse bilingual training data is collected and used for pre-training, guided by preliminary experiments on data.•Our evaluation results show that MindLLMs outperform larger models like MPT-7B and GPT-J-6B on MMLU and AGIEval.•Leveraging tailored data for a particular ability during instruction tuning can significantly enhance the specific ability of lightweight models.•We introduce the approach to construct an instruction set using an entropy-based quality filtering strategy and demonstrate its effectiveness in filtering high-quality instruction tuning data for lightweight models.•Our models showcase outstanding performance in specific domains, particularly in areas like law and finance.
Loading