MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

ACL ARR 2024 April Submission811 Authors

16 Apr 2024 (modified: 02 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Bigger the better has been the predominant trend in recent Large Language Models (LLMs) development. However, LLMs do not suit well for scenarios that require on-device processing, energy efficiency, low memory footprint, and response efficiency. These requisites are crucial for privacy, security, and sustainable deployment. This paper explores the less is more paradigm by addressing the challenge of designing accurate yet efficient Small Language Models (SLMs) for resource constrained devices. Our primary contribution is the introduction of an accurate and fully transparent open-source 0.5 billion (0.5B) parameter SLM, named \emph{MobiLlama}, catering to the specific needs of resource-constrained computing with an emphasis on enhanced performance with reduced resource demands. \emph{MobiLlama} is a SLM design that initiates from a larger model and applies a careful parameter sharing scheme to reduce both the pre-training and the deployment cost. Our work strives to not only bridge the gap in open-source SLMs but also ensures full transparency, where complete training data pipeline, training code, model weights, and over 300 checkpoints along with evaluation codes will be publicly released.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Large Language Model, Efficiency
Contribution Types: Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 811
Loading