# Condensed: C-MTEB

Summary: This tutorial provides implementation guidance for C-MTEB (Chinese Multi-Task Embedding Benchmark), covering model integration and evaluation across multiple NLP tasks. It demonstrates how to implement embedding models using FlagDRESModel or SentenceTransformer, with specific focus on encoding methods for various tasks like classification, clustering, and retrieval. The tutorial details task-specific metrics (F1, v-measure, MRR@k, etc.), evaluation procedures, and leaderboard submission process. Key functionalities include batch processing, GPU acceleration, and handling different Chinese NLP tasks with appropriate metrics and evaluation approaches. This knowledge is particularly useful for implementing and evaluating embedding models on Chinese language tasks.

*This is a condensed version that preserves essential implementation details and context.*

Here's the condensed tutorial focusing on essential implementation details:

# C-MTEB Implementation Guide

## Installation
```bash
pip install FlagEmbedding mteb
```

## Key Components

### 1. Task Types & Metrics
- **Classification**: Logistic regression training/testing (F1 metric)
- **Clustering**: Mini-batch k-means (batch_size=32, k=num_labels, v-measure metric)
- **Pair Classification**: Binary classification (average precision metric)
- **Reranking**: Query-based ranking (MRR@k, MAP metrics)
- **Retrieval**: Document retrieval (nDCG@k metric)
- **STS**: Sentence pair similarity (Spearman correlation metric)

### 2. Model Implementation

```python
# Option 1: Using FlagDRESModel (for retrieval tasks)
from C_MTEB.flag_dres_model import FlagDRESModel

model = FlagDRESModel(
    model_name_or_path="BAAI/bge-base-zh-v1.5",
    query_instruction_for_retrieval="为这个句子生成表示以用于检索相关文章：",
    pooling_method="cls"
)

# Option 2: Using SentenceTransformer
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("PATH_TO_MODEL")

# Option 3: Custom Implementation
class MyModel():
    def __init__(self):
        pass
    
    def encode(self, sentences, batch_size=32, **kwargs):
        """
        Args:
            sentences (List[str]): Input sentences
            batch_size (int): Batch size
        Returns:
            List[np.ndarray/tensor]: Embeddings
        """
        pass
```

### 3. Evaluation
```python
import mteb
from mteb import MTEB

# Available Chinese tasks
ChineseTaskList = [
    'TNews', 'IFlyTek', 'MultilingualSentiment', 'JDReview', 
    'OnlineShopping', 'Waimai', 'CLSClusteringS2S.v2', 
    # ... (other tasks)
]

# Run evaluation
tasks = mteb.get_tasks(ChineseTaskList)
for task in tasks:
    evaluation = MTEB(tasks=[task])
    evaluation.run(model, output_folder=f"zh_results/{model_name.split('/')[-1]}")
```

### 4. Leaderboard Submission
```bash
mteb create_meta --results_folder results/{model_name}/ --output_path model_card.md
```

## Best Practices
1. Use GPU for efficient evaluation
2. Results are stored in `zh_results/{model_name}/`
3. Submit to leaderboard by adding model_card.md content to HF Hub README.md

## Important Notes
- Model must implement the `encode()` method with proper batch processing
- Different tasks require different metrics and evaluation approaches
- Check [HF page](https://huggingface.co/C-MTEB) for detailed dataset information