Learning Better Structured Representations Using Low-rank Adaptive Label Smoothing

Asish Ghoshal; Xilun Chen; Sonal Gupta; Luke Zettlemoyer; Yashar Mehdad

Learning Better Structured Representations Using Low-rank Adaptive Label Smoothing

Asish Ghoshal, Xilun Chen, Sonal Gupta, Luke Zettlemoyer, Yashar Mehdad

Published: 12 Jan 2021, Last Modified: 05 May 2023ICLR 2021 PosterReaders: Everyone

Keywords: label smoothing, calibration, semantic parsing, structured prediction

Abstract: Training with soft targets instead of hard targets has been shown to improve performance and calibration of deep neural networks. Label smoothing is a popular way of computing soft targets, where one-hot encoding of a class is smoothed with a uniform distribution. Owing to its simplicity, label smoothing has found wide-spread use for training deep neural networks on a wide variety of tasks, ranging from image and text classification to machine translation and semantic parsing. Complementing recent empirical justification for label smoothing, we obtain PAC-Bayesian generalization bounds for label smoothing and show that the generalization error depends on the choice of the noise (smoothing) distribution. Then we propose low-rank adaptive label smoothing (LORAS): a simple yet novel method for training with learned soft targets that generalizes label smoothing and adapts to the latent structure of the label space in structured prediction tasks. Specifically, we evaluate our method on semantic parsing tasks and show that training with appropriately smoothed soft targets can significantly improve accuracy and model calibration, especially in low-resource settings. Used in conjunction with pre-trained sequence-to-sequence models, our method achieves state of the art performance on four semantic parsing data sets. LORAS can be used with any model, improves performance and implicit model calibration without increasing the number of model parameters, and can be scaled to problems with large label spaces containing tens of thousands of labels.

One-sentence Summary: We propose an extension of label smoothing which improves generalization performance by adapting to the structure present in label space of structured prediction tasks.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Data: [SNIPS](https://paperswithcode.com/dataset/snips), [TOPv2](https://paperswithcode.com/dataset/topv2)

10 Replies

Loading