Jina Embeddings V3: Multilingual Text Encoder with Low-Rank Adaptations

Saba Sturua, Isabelle Mohr, Mohammad Kalim Akram, Michael Günther, Bo Wang, Markus Krimmel, Feng Wang, Georgios Mastrapas, Andreas Koukounas, Nan Wang, Han Xiao

Published: 01 Jan 2025, Last Modified: 16 May 2025ECIR (5) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We introduce Jina Embeddings V3, a 570-million-parameter text embedding model that excels in long-context (up to 8192 tokens) and multilingual text retrieval tasks. The model incorporates task-specific Low-Rank Adaptation (LoRA) modules for high-quality embeddings specialized for retrieval, clustering, classification, and text matching. On the MTEB benchmark, Jina Embeddings V3 outperforms other embedding models of similar size.