Training and Evaluating Norwegian Sentence Embedding ModelsDownload PDF

Published: 20 Mar 2023, Last Modified: 14 Apr 2023NoDaLiDa 2023Readers: Everyone
Keywords: sentence embeddings, contrastive learning, norwegian
TL;DR: We train and evaluate Norwegian SimCSE models for sentence embeddings, and evaluate on translated STS data.
Abstract: We train and evaluate Norwegian sentence embedding models using the contrastive learning methodology SimCSE. We start from pre-trained Norwegian encoder models and train both unsupervised and supervised models. The models are evaluated on a machine-translated version of semantic textual similarity datasets, as well as binary classification tasks. We show that we can train good Norwegian sentence embedding models, that clearly outperform the pre-trained encoder models, as well as the multilingual mBERT, on the task of sentence similarity.
3 Replies

Loading