Keywords: Calibration, Zero-shot Cross-lingual Transfer, Uncertainty
Abstract: We investigate model calibration in the setting of zero-shot cross-lingual transfer with large-scale pre-trained language models. The level of model calibration is an important metric for evaluating the trustworthiness of predictive models. There exists an essential need for model calibration when natural language models are deployed in critical tasks. We study different post-training calibration methods in structured and unstructured prediction tasks. We find that models trained with data from the source language become less calibrated when applied to the target language, and that calibration errors increase with intrinsic task difficulty and relative sparsity of training data. Moreover, we observe a potential connection between the level of calibration error and an earlier proposed measure of the distance from English to other languages. Finally, our comparison demonstrates that among other methods Temperature Scaling (TS) and Gaussian Process Calibration(GPcalib) generalizes well to distant languages, but TS fails to calibrate more complex confidence estimation in structured predictions.
Paper Type: long
Editor Reassignment: no
Reviewer Reassignment: no
0 Replies
Loading