Japanese SimCSE Technical Report

Published: 01 Jan 2023, Last Modified: 19 Feb 2025CoRR 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We report the development of Japanese SimCSE, Japanese sentence embedding models fine-tuned with SimCSE. Since there is a lack of sentence embedding models for Japanese that can be used as a baseline in sentence embedding research, we conducted extensive experiments on Japanese sentence embeddings involving 24 pre-trained Japanese or multilingual language models, five supervised datasets, and four unsupervised datasets. In this report, we provide the detailed training setup for Japanese SimCSE and their evaluation results.
Loading