Multilingual Compression Parity: How Efficiently Large Language Models Represent Information Across Languages?
Keywords: Multilingual Compression Parity, Information Theory, Multilingual LLMs, Multilingual Evaluations, Compression Parity
TL;DR: Compression Parity - a new metric based on information theory which predicts multilingual performance of LLMs.
Abstract: Large Language Models (LLMs) are increasingly
deployed in user-facing applications worldwide,
necessitating the handling of multiple languages
across a variety of tasks. However, there is no one
metric that can predict a LLM’s multilingual capabilities. To address this gap, we propose Compression Parity (CP) – a metric based on Shannon’s
information measure – to assess the multilingual
capabilities of a LLM in a task-agnostic manner.
We evaluate CP on open-sourced LLMs (Llama2,
Gemma, Mistral) and demonstrate a strong correlation with existing task-specific metrics from the
literature – better than any of the existing metrics
we are aware of, e.g., tokenizer parity and fertility.
These findings show that CP is a good predictor
of an LLM’s performance in a certain language,
hence it may serve as a useful tool for ranking
multilingual LLMs’ capabilities regardless of the
downstream task.
Submission Number: 32
Loading