# Condensed: MTEB

Summary: This tutorial demonstrates how to implement MTEB (Massive Text Embedding Benchmark) evaluation for embedding models using sentence-transformers. It provides code for setting up MTEB evaluations, loading pre-trained models, selecting specific NLP tasks (like retrieval tasks), and running benchmarks with detailed metrics. The tutorial helps with tasks involving model evaluation, embedding quality assessment, and performance benchmarking, covering key functionalities such as MAP, MRR, NDCG scoring, and multi-task evaluation. It's particularly useful for implementing systematic evaluation pipelines for text embedding models and understanding their performance across different retrieval tasks.

*This is a condensed version that preserves essential implementation details and context.*

Here's the condensed tutorial focusing on essential implementation details:

# MTEB Evaluation Tutorial

## Installation
```python
pip install sentence_transformers mteb
```

## Key Implementation Details

1. **Basic Setup**
```python
import mteb
from sentence_transformers import SentenceTransformer

# Load model
model_name = "BAAI/bge-base-en-v1.5"
model = SentenceTransformer(model_name)
```

2. **Task Selection**
```python
# Example of retrieval tasks (showing partial list)
retrieval_tasks = [
    "ArguAna",
    "ClimateFEVER",
    "DBPedia",
    "FEVER",
    # ... other tasks
]

# Select specific tasks for evaluation
tasks = mteb.get_tasks(tasks=retrieval_tasks[:1])  # Using only ArguAna for demo
```

3. **Running Evaluation**
```python
# Initialize MTEB with selected tasks
evaluation = mteb.MTEB(tasks=tasks)

# Run evaluation
results = evaluation.run(model, output_folder="results")
```

## Important Notes

- Results are stored in `{output_folder}/{model_name}/{model_revision}/{task_name}.json`
- Evaluation metrics include MAP, MRR, NDCG, Precision, and Recall at different cutoffs (1, 3, 5, 10, 20, 100, 1000)
- MTEB supports various NLP tasks and languages (full list available in [MTEB documentation](https://github.com/embeddings-benchmark/mteb/blob/main/docs/tasks.md))

## Output Format
Results JSON contains:
- Dataset information
- Evaluation time
- Detailed scores including:
  - MAP scores
  - MRR scores
  - NDCG scores
  - Precision/Recall at different cutoffs
  - NAUC metrics

## Best Practices
- Start with a single task for testing
- Use full task set for comprehensive evaluation
- Consider computational resources when selecting tasks
- Check MTEB documentation for specific task requirements