MOYU: Massive Over-activation Yielded Uplifts in LLMs

MOYU: Massive Over-activation Yielded Uplifts in LLMs

ACL ARR 2024 June Submission1975 Authors

15 Jun 2024 (modified: 12 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Massive Over-activation Yielded Uplifts(MOYU) is the inherent properties of large language models and dynamic activation (DA) based on MOYU property is a clever but under-explored method designed to accelerate inference in large language models. Existing approaches to utilize MOYU typically face at least one major drawback, whether in maintaining model performance, enhancing inference speed, or broadening applicability across different architectures. This paper introduces two Sequential DA methods called sTDA and sRIDA that leverage sequence information while utilizing MOYU property, effectively overcome the "impossible triangle" that bothers current DA approaches. Our two schemes have improved generation speeds by 20-25\% without significantly compromising the model's task performance. Additionally, given the blur of theoretical studies of MOYU, this paper also explains its root cause, then outlines the mechanisms of two main limitations (i.e. history-related activation uncertainty and semantic-irrelevant activation inertia) faced by existing DA methods.

Paper Type: Long

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: Dynamic Activation, Threshold, Router, MoE, Sparsity

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Theory

Languages Studied: English

Submission Number: 1975

Loading