Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation

Xiang Lin; SIMENG HAN; Shafiq Joty

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation

Xiang Lin, SIMENG HAN, Shafiq Joty

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: text generation, text degeneration, language model, summarization, image captioning

Abstract: Advanced large-scale neural language models have led to significant success in many natural language generation tasks. However, the most commonly used training objective, Maximum Likelihood Estimation (MLE), has been shown to be problematic, where the trained model prefers using dull and repetitive phrases. In this work, we introduce ScaleGrad, a modification straight to the gradient of the loss function, to remedy the degeneration issues of the standard MLE objective. By directly maneuvering the gradient information, ScaleGrad makes the model learn to use novel tokens during training. Empirical results show the effectiveness of our method not only in open-ended generation, but also in directed generation. With the simplicity in architecture, our method can serve as a general training objective that is applicable to most of the neural text generation tasks.

One-sentence Summary: We proposed a simple modification to MLE based on gradient analysis and achieved significant improvement on token-level degeneration in different tasks.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/straight-to-the-gradient-learning-to-use/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=YHyXKAKpbx

5 Replies

Loading