# Condensed: BGE-M3

Summary: This tutorial provides implementation details for BGE-M3, a multi-modal retrieval model, covering three key retrieval techniques: dense, sparse, and multi-vector processing. It demonstrates how to initialize the model using FlagEmbedding library and implement hybrid ranking by combining different retrieval scores. The tutorial includes code snippets for computing embeddings, similarity scores, and lexical matching, with specific formulas for each approach. It's particularly useful for tasks involving text similarity search, document retrieval, and reranking, featuring optimization techniques like FP16 support and max_length adjustments. The implementation is based on XLM-RoBERTa-large and includes best practices for balancing computational costs with retrieval effectiveness.

*This is a condensed version that preserves essential implementation details and context.*

Here's the condensed tutorial focusing on essential implementation details:

# BGE-M3 Implementation Guide

## Installation
```python
pip install -U transformers FlagEmbedding accelerate
```

## Core Components

### 1. Model Initialization
```python
from FlagEmbedding import BGEM3FlagModel
model = BGEM3FlagModel('BAAI/bge-m3', use_fp16=True)
```

### 2. Key Functionalities

#### 2.1 Dense Retrieval
- Uses normalized [CLS] token hidden state
- Formula: `s_dense = f_sim(e_p, e_q)`

```python
embeddings = model.encode(sentences, max_length=10)['dense_vecs']
similarity_scores = embeddings_1 @ embeddings_2.T
```

#### 2.2 Sparse Retrieval
- Generates sparse embeddings using ReLU activation
- Formula: `w_qt = ReLU(W_lex^T H_q[i])`

```python
output = model.encode(sentences, return_sparse=True)
# Get lexical matching score
lex_score = model.compute_lexical_matching_score(output_1['lexical_weights'][0], 
                                               output_2['lexical_weights'][0])
```

#### 2.3 Multi-Vector Processing
- Utilizes complete output embeddings
- Formula: `E_q = norm(W_mul^T H_q)`

```python
output = model.encode(sentences, 
                     return_dense=True, 
                     return_sparse=True, 
                     return_colbert_vecs=True)
mul_score = model.colbert_score(output_1['colbert_vecs'][0], 
                              output_2['colbert_vecs'][0]).item()
```

#### 2.4 Hybrid Ranking
Formula: `s_rank = w1*s_dense + w2*s_lex + w3*s_mul`

```python
final_score = (1/3 * dense_score + 
              1/3 * lexical_score + 
              1/3 * multi_vector_score)
```

## Important Notes
1. Base model: XLM-RoBERTa-large
2. Adjust `max_length` parameter for performance optimization
3. Weights in hybrid ranking should be tuned based on use case
4. Multi-vector processing is computationally intensive - best used for reranking

## Best Practices
1. Use FP16 for better performance
2. Implement initial filtering with dense/sparse retrieval before multi-vector
3. Customize weight distribution in hybrid ranking based on specific requirements
4. Consider computational costs when choosing between different retrieval methods