# Generalizable and Composable Multi-Model Embedding Translation

## Abstract

Embedding translation enables interoperability across embedding models, allowing embedding vectors to be reused without costly re-embedding. However, existing methods are typically evaluated under simplified pairwise and in-domain settings and behave as black boxes at inference time, leading to unreliable performance under out-of-distribution (OOD) inputs, multi-model mixing, and composed translations.

We analyze embedding translation from a geometric perspective and derive an interpretable error bound that explains systematic error amplification under OOD inputs, mixing and chaining. Building on this, we propose a geometry-aware confidence metric and a **Hierarchical Mixture of Experts** framework with localized, parameter-efficient adaptation.

Following MTEB leaderboard, we conduct large-scale experiments over **10** embedding models and **6** benchmarks across **90** translation directions. Our method outperforms *every* baseline for *every* model pair over *every* benchmark under OOD scenarios. Furthermore, multi-model mixing and chaining only degrade our performance in Recall@100 by 0.5% ~ 2.6%, compared to 7.2% ~ 92.3% recall drop by existing methods.


## Installation

```bash
# install uv from its official website
uv venv
uv sync
```


## Experiments

The evaluation is conducted on six retrieval datasets from the BEIR benchmark: SciFact, NFCorpus, ArguAna, SciDocs, FiQA-2018, and Fever. The study involves 10 major embedding models including linq, e5, sfr, gritlm, kalm, nemotron, qwen, openai, mistral, and gemini.

### Pairwise OOD Translation

Train on a source domain (e.g., Fever) and evaluate on a different target domain (e.g., SciFact).

```bash
uv run python -m src.cli.multi_train multi-train \
  --set mapper.gating_moe.cluster_num=4 \
  --set model.source_model=gemini \
  --set model.target_model=openai \
  --set dataset.train_dataset_list=fever \
  --set dataset.test_dataset_list=scifact
```

### Multi-Model Mixing

Aggregating embeddings from heterogeneous source models into a shared target index.

```bash
uv run python -m src.cli.many_to_one train \
  --set target_model=openai \
  --set runs.0.model.source_model=gemini \
  --set runs.1.model.source_model=mistral \
  --set dataset.train_dataset_list=fever \
  --set dataset.test_dataset_list=scifact \
  --set mapper.gating_moe.cluster_num=4
```

### Multi-Model Chaining

Sequential composition of translators through an intermediate embedding space (e.g., A→B→C)

```bash
uv run python -m src.cli.transitivity run \
  --set dataset.train_dataset_list=fever \
  --set dataset.test_dataset_list=scifact \
  --set cases.0.run_ab.model.source_model=gemini \
  --set cases.0.run_ab.model.target_model=mistral \
  --set cases.0.run_bc.model.source_model=mistral \
  --set cases.0.run_bc.model.target_model=openai \
  --set cases.0.run_ac.model.source_model=gemini \
  --set cases.0.run_ac.model.target_model=openai
```

### Project Structure
src/cli/: Command-line interfaces for training, mixing, and transitivity experiments.

src/model/: Core H-MOE implementation and base translator architectures.

src/mapper/: Hierarchical clustering, LoRA adaptation, and query routing logic.

src/utils/: Implementation of the Translation Confidence (TC) metric and geometric loss functions.


### Hyperparameters

Standard experimental configurations based on sensitivity analysis:


Routing Threshold (τ): 0.8

Local Loss Weight (α): 0.5

Directional Constraint Weight (β): 0.7

LoRA Rank (r): 8

Cluster Count (k): 8 for Fever, 4 for other datasets.

### Dataset Preparation
Note due to the large size of the dataset (~900GB), we only provide a sample of the dataset in the `data/processed/embeddings/` directory. To prepare the dataset, you can run the following command:
```bash
uv run python -m src.cli.generate_embeddings generate \
  --set model=gemini \
  --set datasets=arguana \
  --set output_dir=data/processed/embeddings/ \
  --set device=cuda \
  --set batch_size=128
```
We will also publish the full dataset after the paper is accepted.


