Summary: This tutorial provides implementation guidance for BGE Rerankers, focusing on model initialization and scoring computation across different versions. It covers techniques for working with various reranker models (base, large, v2 series) using the FlagEmbedding library, including specialized implementations like LayerWise and LightWeight rerankers. The tutorial helps with tasks involving text pair scoring and reranking, particularly for search and retrieval applications. Key features include multilingual support, performance optimization techniques (FP16, layer selection, compression), and model selection guidelines based on specific use cases, with code examples demonstrating different initialization patterns and scoring methods for each model variant.

# BGE Reranker

Like embedding models, BGE has a group of rerankers with various sizes and functionalities. In this tutorial, we will introduce the BGE rerankers series.

## 0. Installation

Install the dependencies in the environment.


```python
%pip install -U FlagEmbedding
```

## 1. bge-reranker

The first generation of BGE reranker contains two models:

| Model  | Language |   Parameters   |    Description    |   Base Model     |
|:-------|:--------:|:----:|:-----------------:|:--------------------------------------:|
| [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base)   |   Chinese and English |     278M     |  a cross-encoder model which is more accurate but less efficient     |  XLM-RoBERTa-Base  |
| [BAAI/bge-reranker-large](https://huggingface.co/BAAI/bge-reranker-large) |   Chinese and English |     560M     |   a cross-encoder model which is more accurate but less efficient    |  XLM-RoBERTa-Large  |


```python
from FlagEmbedding import FlagReranker

model = FlagReranker(
    'BAAI/bge-reranker-large',
    use_fp16=True,
    devices=["cuda:0"],   # if you don't have GPUs, you can use "cpu"
)

pairs = [
    ["What is the capital of France?", "Paris is the capital of France."],
    ["What is the capital of France?", "The population of China is over 1.4 billion people."],
    ["What is the population of China?", "Paris is the capital of France."],
    ["What is the population of China?", "The population of China is over 1.4 billion people."]
]

scores = model.compute_score(pairs)
scores
```

## 2. bge-reranker v2

| Model  | Language |   Parameters   |    Description    |   Base Model     |
|:-------|:--------:|:----:|:-----------------:|:--------------------------------------:|
| [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) | Multilingual |     568M     | a lightweight cross-encoder model, possesses strong multilingual capabilities, easy to deploy, with fast inference. | XLM-RoBERTa-Large |
| [BAAI/bge-reranker-v2-gemma](https://huggingface.co/BAAI/bge-reranker-v2-gemma) | Multilingual |     2.51B     | a cross-encoder model which is suitable for multilingual contexts, performs well in both English proficiency and multilingual capabilities. | Gemma2-2B |
| [BAAI/bge-reranker-v2-minicpm-layerwise](https://huggingface.co/BAAI/bge-reranker-v2-minicpm-layerwise) | Multilingual |    2.72B    | a cross-encoder model which is suitable for multilingual contexts, performs well in both English and Chinese proficiency, allows freedom to select layers for output, facilitating accelerated inference. | MiniCPM |
| [BAAI/bge-reranker-v2.5-gemma2-lightweight](https://huggingface.co/BAAI/bge-reranker-v2.5-gemma2-lightweight) | Multilingual |    9.24B    | a cross-encoder model which is suitable for multilingual contexts, performs well in both English and Chinese proficiency, allows freedom to select layers, compress ratio and compress layers for output, facilitating accelerated inference. | Gemma2-9B |

### bge-reranker-v2-m3

bge-reranker-v2-m3 is trained based on bge-m3, introducing great multi-lingual capability as keeping a slim model size.


```python
from FlagEmbedding import FlagReranker

# Setting use_fp16 to True speeds up computation with a slight performance degradation (if using gpu)
reranker = FlagReranker('BAAI/bge-reranker-v2-m3', devices=["cuda:0"], use_fp16=True)

score = reranker.compute_score(['query', 'passage'])
# or set "normalize=True" to apply a sigmoid function to the score for 0-1 range
score = reranker.compute_score(['query', 'passage'], normalize=True)

print(score)
```

### bge-reranker-v2-gemma

bge-reranker-v2-gemma is trained based on gemma-2b. It has excellent performances with both English proficiency and multilingual capabilities.


```python
from FlagEmbedding import FlagLLMReranker

reranker = FlagLLMReranker('BAAI/bge-reranker-v2-gemma', devices=["cuda:0"], use_fp16=True)

score = reranker.compute_score(['query', 'passage'])
print(score)
```

### bge-reranker-v2-minicpm-layerwise

bge-reranker-v2-minicpm-layerwise is trained based on minicpm-2b-dpo-bf16. It's suitable for multi-lingual contexts, performs well in Both English and Chinese proficiency.

Another special functionality is the layerwise design gives user freedom to select layers for output, facilitating accelerated inference.


```python
from FlagEmbedding import LayerWiseFlagLLMReranker

reranker = LayerWiseFlagLLMReranker('BAAI/bge-reranker-v2-minicpm-layerwise', devices=["cuda:0"], use_fp16=True)

# Adjusting 'cutoff_layers' to pick which layers are used for computing the score.
score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28])
print(score)
```

### bge-reranker-v2.5-gemma2-lightweight

bge-reranker-v2.5-gemma2-lightweight is trained based on gemma2-9b. It's also suitable for multi-lingual contexts.

Besides the layerwise reduction functionality, bge-reranker-v2.5-gemma2-lightweight integrates token compression capabilities to further save more resources while maintaining outstanding performances.


```python
from FlagEmbedding import LightWeightFlagLLMReranker

reranker = LightWeightFlagLLMReranker('BAAI/bge-reranker-v2.5-gemma2-lightweight', devices=["cuda:0"], use_fp16=True)

# Adjusting 'cutoff_layers' to pick which layers are used for computing the score.
score = reranker.compute_score(['query', 'passage'], cutoff_layers=[28], compress_ratio=2, compress_layers=[24, 40])
print(score)
```

## Comparison

BGE reranker series provides a great number of choices for all kinds of functionalities. You can select the model according your senario and resource:

- For multilingual, utilize `BAAI/bge-reranker-v2-m3`, `BAAI/bge-reranker-v2-gemma` and `BAAI/bge-reranker-v2.5-gemma2-lightweight`.

- For Chinese or English, utilize `BAAI/bge-reranker-v2-m3` and `BAAI/bge-reranker-v2-minicpm-layerwise`.

- For efficiency, utilize `BAAI/bge-reranker-v2-m3` and the low layer of `BAAI/bge-reranker-v2-minicpm-layerwise`.

- For saving resources and extreme efficiency, utilize `BAAI/bge-reranker-base` and `BAAI/bge-reranker-large`.

- For better performance, recommand `BAAI/bge-reranker-v2-minicpm-layerwise` and B`AAI/bge-reranker-v2-gemma`.

Make sure always test on your real use case and choose the one with best speed-quality balance!
