Abstract: Mobile artificial intelligence has recently gained more attention due to the increasing computing power of mobile devices and applications in computer vision, natural language processing, and internet of things. Although large pre-trained language models (e.g., BERT, GPT) have recently achieved the state-of-the-art results on text classification tasks, they are not well suited for latency critical applications on mobile devices. Therefore, it is essential to design tiny models to reduce their memory and computing requirements. Model compression has shown promising results for this goal. However, some significant challenges are yet to be addressed, such as information loss and adversarial robustness. This paper attempts to tackle these challenges through a new training scheme that minimizes the information loss by maximizing the mutual information between the feature representations learned from the large and tiny models. In addition, we propose a certifiably robust defense method named GradMASK that masks a certain proportion of words in an input text. It can defend against both character-level perturbations and word substitution-based attacks. We perform extensive experiments demonstrating the effectiveness of our approach by comparing our tiny RNN models with compact RNNs (e.g., FastGRNN) and compressed RNNs (e.g., PRADO) in clean and adversarial test settings.
0 Replies
Loading