- TL;DR: This paper proposes a novel lightweight Transformer for character-level language modeling, utilizing group-wise operations.
- Abstract: Character-level language modeling is an essential but challenging task in Natural Language Processing. Prior works have focused on identifying long-term dependencies between characters and have built deeper and wider networks for better performance. However, their models require substantial computational resources, which hinders the usability of character-level language models in applications with limited resources. In this paper, we propose a lightweight model, called Group-Transformer, that reduces the resource requirements for a Transformer, a promising method for modeling sequence with long-term dependencies. Specifically, the proposed method partitions linear operations to reduce the number of parameters and computational cost. As a result, Group-Transformer only uses 18.2\% of parameters compared to the best performing LSTM-based model, while providing better performance on two benchmark tasks, enwik8 and text8. When compared to Transformers with a comparable number of parameters and time complexity, the proposed model shows better performance. The implementation code will be available.
- Keywords: Transformer, Lightweight model, Language Modeling, Character-level language modeling