MeRino: Entropy-driven Design for Mobile-friendly Generative Language Models

Youpeng Zhao; Ming Lin; Huadong Tang; Qiang Wu; Jun Wang

MeRino: Entropy-driven Design for Mobile-friendly Generative Language Models

Youpeng Zhao, Ming Lin, Huadong Tang, Qiang Wu, Jun Wang

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: deep learning architecture, autoregressive transformers

TL;DR: We present a novel information-entropy framework to designing mobile-friendly generative language models, termed MeRino

Abstract: Generative Large Language Models (LLMs) stand as a revolutionary advancement in the modern era of artificial intelligence (AI). However, deploying LLMs to resource-constrained devices is difficult due to their high computational cost. In this paper, we propose a novel information-entropy framework for designing mobile-friendly generative language models. Our key design paradigm is to maximize the entropy of transformer decoders within the given computational budgets. The whole design procedure involves solving a mathematical programming (MP) problem, which can be done on the CPU within minutes, making it nearly zero-cost. We evaluate our designed models, termed MeRino, across twelve NLP downstream tasks, showing their competitive performance against the state-of-the-art autoregressive transformer models under the mobile setting. Notably, MeRino achieves similar or better zero and one-shot performance compared to the 350M parameter OPT while being 4.9$\times$ faster on mobile devices with 5.5$\times$ reduction in model size.

Supplementary Material: pdf

Primary Area: representation learning for computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7730

Loading