STRA: A Simple Token Replacement Strategy Alleviating Exposure Bias in Text Generation

Published: 01 Jan 2024, Last Modified: 05 Mar 2025ICME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In general text generation, models are typically trained on ground truth tokens, where erroneous tokens are not considered during generation. However, during the inference process, erroneous tokens are inevitably generated, and such errors accumulate in the autoregressive generation, leading to exposure bias. To address this problem directly, we propose a token replacement strategy with a sequential decay rate to simulate inference scenarios during training. Firstly, during the training process, we randomly replace tokens in the target text with a certain probability, simulating situations where erroneous tokens may be generated during inference. Then, we simulate inference scenarios by performing probability-decayed replacements from start to end. Since earlier generated tokens can impact the generation of subsequent tokens, the preceding tokens exert a stronger influence. Our method improves the base models’ performance on image captioning, text summarization, and dialogue datasets, achieving state-of-the-art performance on the query-focused summarization dataset.
Loading