Benchmarking Pretrained Language Models for Italian Natural Language Understanding

Anonymous

Benchmarking Pretrained Language Models for Italian Natural Language Understanding

Anonymous

04 Mar 2022 (modified: 05 May 2023)Submitted to NLP for ConvAIReaders: Everyone

Keywords: NLU, low-resource languages, conversational systems

TL;DR: We address the issue of creating effective NLU components for lower resource languages, presenting a new benchmark for Italian NLU.

Abstract: Since the advent of Transformer-based, pretrained language models (LM) such as BERT, Natural Language Understanding (NLU) components in the form of Dialogue Act Recognition (DAR) and Slot Recognition (SR) for dialogue systems have become both more accurate and easier to create for specific application domains. Unsurprisingly however, much of this progress has been limited to the English language due to the existence of very large datasets in both dialogue and written form. In this paper, we use the newly released JILDA dataset to benchmark three of the most recent pretrained LMs: Italian BERT, Multilingual BERT, and AlBERTo. Results show that the monolingual version of BERT performs better than both the multilingual one and AlBERTo. This paper highlights the challenges that still remain in creating effective NLU components for lower resource languages, and constitutes a first step in improving NLU for Italian dialogue.

0 Replies

Loading