No-Skim: Towards Efficiency Robustness Evaluation on Skimming-based Language Models

Anonymous

No-Skim: Towards Efficiency Robustness Evaluation on Skimming-based Language Models

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: To reduce the computation cost and the energy consumption in large language models (LLM), skimming-based acceleration dynamically drops unimportant tokens of the input sequence progressively along layers of the LLM while preserving the tokens of semantic importance. However, our work for the first time reveals the acceleration may be vulnerable to \textit{Denial-of-Service} (DoS) attacks. In this paper, we propose \textit{No-Skim}, a general framework to help the owners of skimming-based LLM to understand and measure the efficiency robustness of their acceleration scheme. Specifically, our framework searches minimal and unnoticeable perturbations to generate adversarial inputs that sufficiently increase the remaining token ratio, thus increasing the computation cost and energy consumption. With no direct access to the model internals, we further devise a time-based approximation algorithm to infer the remaining token ratio as the loss oracle. We systematically evaluate the vulnerability of the skimming acceleration in various LLM architectures including BERT and RoBERTa on the GLUE benchmark. In the worst case, the perturbation found by \textit{No-Skim} substantially increases the running cost of LLM by over 106\% on average.

Paper Type: long

Research Area: Interpretability and Analysis of Models for NLP

Contribution Types: NLP engineering experiment

Languages Studied: English

0 Replies

Loading