Multilingual Compression Parity: How Efficiently Large Language Models Represent Information Across Languages?

Alexander Tsvetkov; Alon Kipnis

Multilingual Compression Parity: How Efficiently Large Language Models Represent Information Across Languages?

Alexander Tsvetkov, Alon Kipnis

Published: 18 Jun 2024, Last Modified: 19 Jul 2024TF2M 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multilingual Compression Parity, Information Theory, Multilingual LLMs, Multilingual Evaluations, Compression Parity

TL;DR: Compression Parity - a new metric based on information theory which predicts multilingual performance of LLMs.

Abstract: Large Language Models (LLMs) are increasingly deployed in user-facing applications worldwide, necessitating the handling of multiple languages across a variety of tasks. However, there is no one metric that can predict a LLM’s multilingual capabilities. To address this gap, we propose Compression Parity (CP) – a metric based on Shannon’s information measure – to assess the multilingual capabilities of a LLM in a task-agnostic manner. We evaluate CP on open-sourced LLMs (Llama2, Gemma, Mistral) and demonstrate a strong correlation with existing task-specific metrics from the literature – better than any of the existing metrics we are aware of, e.g., tokenizer parity and fertility. These findings show that CP is a good predictor of an LLM’s performance in a certain language, hence it may serve as a useful tool for ranking multilingual LLMs’ capabilities regardless of the downstream task.

Submission Number: 32

Loading