# Condensed: Evaluate Reranker

Summary: This tutorial demonstrates how to implement and evaluate reranking models using the FlagEmbedding library, specifically focusing on BGE rerankers. It covers the technical setup and configuration of reranking evaluation pipelines, including crucial parameters like search_top_k, rerank_top_k, and batch sizes. The tutorial helps with tasks related to semantic search optimization, particularly the reranking of embedding-based search results. Key features include multi-GPU support, batch processing configuration, handling different model variants (BGE Reranker Large and V2 M3), and performance evaluation using metrics like NDCG and recall at various k values.

*This is a condensed version that preserves essential implementation details and context.*

Here's the condensed tutorial focusing on essential implementation details:

# Reranker Evaluation Guide

## Key Concepts
- Rerankers provide better semantic matching but have O(N²) complexity
- Typically used to rerank top-k results from embedding-based retrieval
- Evaluation compares reranking performance on embedding model results

## Implementation Details

### Setup
```python
%pip install FlagEmbedding
```

### Evaluation Pipeline Configuration

Key parameters for both rerankers:
```bash
--dataset_names fiqa
--splits test dev
--search_top_k 1000    # Initial embedding search results
--rerank_top_k 100     # Number of results to rerank
--k_values 10 100      # Evaluation cutoff points
--eval_metrics ndcg_at_10 recall_at_100
--embedder_name_or_path BAAI/bge-large-en-v1.5
```

### 1. BGE Reranker Large
```bash
python -m FlagEmbedding.evaluation.beir \
--reranker_name_or_path BAAI/bge-reranker-large \
--embedder_batch_size 1024 \
--reranker_batch_size 1024 \
--devices cuda:0
```

### 2. BGE Reranker V2 M3
```bash
python -m FlagEmbedding.evaluation.beir \
--reranker_name_or_path BAAI/bge-reranker-v2-m3 \
--embedder_batch_size 1024 \
--reranker_batch_size 1024 \
--devices cuda:0 cuda:1 cuda:2 cuda:3 \
--reranker_max_length 1024
```

## Best Practices
1. Use GPU for better performance
2. Adjust batch sizes based on available memory
3. Configure reranker_max_length based on your data
4. Use multiple GPUs for larger models when available

## Important Notes
- Evaluation is computationally intensive
- bge-reranker-v2-m3 generally performs better on most metrics
- Results can be compared by examining the eval_results.json output

```python
# Compare results
import json
with open('beir/search_results/.../eval_results.json') as f:
    results = json.load(f)
```