OpenELM: An Efficient Language Model Family with Open Training and Inference Framework

Sachin Mehta; Mohammad Hossein Sekhavat; Qingqing Cao; Maxwell Horton; Yanzi Jin; Chenfan Sun; Seyed Iman Mirzadeh; Mahyar Najibi; Dmitry Belenko; Peter Zatloukal; Mohammad Rastegari

OpenELM: An Efficient Language Model Family with Open Training and Inference Framework

Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Seyed Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari

Published: 21 Jun 2024, Last Modified: 26 Jul 2024ES-FoMo-II 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Efficient Language Models

TL;DR: We introduce OpenELM, a state-of-the-art and efficient LLM family with open training and inference frameworks.

Abstract: The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the model, leading to enhanced accuracy. For example, with a budget of around one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring 2X fewer pre-training tokens. Our source code along with pre-trained model weights and training recipes is available at https://github.com/apple/corenet. OpenELM HuggingFace models can be found at: https://huggingface.co/apple/OpenELM.

Submission Number: 15

Loading