Group-Transformer: Towards A Lightweight Character-level Language Model

Sungrae Park; Geewook Kim; Junyeop Lee; Junbum Cha; Ji-Hoon Kim Hwalsuk Lee

Group-Transformer: Towards A Lightweight Character-level Language Model

Sungrae Park, Geewook Kim, Junyeop Lee, Junbum Cha, Ji-Hoon Kim Hwalsuk Lee,

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: This paper proposes a novel lightweight Transformer for character-level language modeling, utilizing group-wise operations.

Abstract: Character-level language modeling is an essential but challenging task in Natural Language Processing. Prior works have focused on identifying long-term dependencies between characters and have built deeper and wider networks for better performance. However, their models require substantial computational resources, which hinders the usability of character-level language models in applications with limited resources. In this paper, we propose a lightweight model, called Group-Transformer, that reduces the resource requirements for a Transformer, a promising method for modeling sequence with long-term dependencies. Specifically, the proposed method partitions linear operations to reduce the number of parameters and computational cost. As a result, Group-Transformer only uses 18.2\% of parameters compared to the best performing LSTM-based model, while providing better performance on two benchmark tasks, enwik8 and text8. When compared to Transformers with a comparable number of parameters and time complexity, the proposed model shows better performance. The implementation code will be available.

Keywords: Transformer, Lightweight model, Language Modeling, Character-level language modeling

Original Pdf: pdf

14 Replies

Loading