# Condensed: Evaluate on MIRACL

Summary: This tutorial demonstrates the implementation of a neural information retrieval system using the MIRACL dataset, focusing on dense retrieval techniques. It covers essential code for embedding-based search, including data loading from MIRACL corpus, generating embeddings using FlagModel (BGE), efficient similarity search with FAISS indexing, and evaluation using pytrec_eval. Key functionalities include batch processing of embeddings, vector similarity search, and computing standard IR metrics (NDCG, Recall). The tutorial is particularly useful for tasks involving document retrieval, semantic search implementation, and IR system evaluation, with specific guidance on handling large-scale datasets and optimizing performance through batching and caching strategies.

*This is a condensed version that preserves essential implementation details and context.*

Here's the condensed tutorial focusing on essential implementation details for evaluating on MIRACL:

# MIRACL Evaluation Guide

## Key Setup
```python
pip install FlagEmbedding pytrec_eval
```

## Implementation Overview

### 1. Data Loading & Processing
```python
from datasets import load_dataset

# Load corpus and dev set
corpus = load_dataset("miracl/miracl-corpus", lang, trust_remote_code=True)['train']
dev = load_dataset('miracl/miracl', lang, trust_remote_code=True)['dev']

# Process corpus and queries
corpus_ids = corpus['docid']
corpus_text = [f"{doc['title']} {doc['text']}".strip() for doc in corpus]
queries_ids = dev['query_id']
queries_text = dev['query']
```

### 2. Embedding Generation
```python
from FlagEmbedding import FlagModel

model = FlagModel('BAAI/bge-base-en-v1.5')
queries_embeddings = model.encode_queries(queries_text)
corpus_embeddings = model.encode_corpus(corpus_text)
```

### 3. Indexing & Search
```python
import faiss
import numpy as np

# Create FAISS index
dim = corpus_embeddings.shape[-1]
index = faiss.index_factory(dim, 'Flat', faiss.METRIC_INNER_PRODUCT)
corpus_embeddings = corpus_embeddings.astype(np.float32)
index.train(corpus_embeddings)
index.add(corpus_embeddings)

# Search
all_scores = []
all_indices = []
for i in range(0, len(queries_embeddings), 32):
    j = min(i + 32, len(queries_embeddings))
    score, indice = index.search(queries_embeddings[i:j].astype(np.float32), k=100)
    all_scores.append(score)
    all_indices.append(indice)
```

### 4. Evaluation
```python
import pytrec_eval

# Configure evaluator
ndcg_string = "ndcg_cut." + ",".join([str(k) for k in [10,100]])
recall_string = "recall." + ",".join([str(k) for k in [10,100]])
evaluator = pytrec_eval.RelevanceEvaluator(qrels_dict, {ndcg_string, recall_string})

# Calculate metrics
scores = evaluator.evaluate(results)
```

## Important Notes

1. **Hardware Requirements**: GPU recommended - takes ~1 hour on 8xA100 40G
2. **Data Structure**:
   - Corpus entries: `docid`, `title`, `text`
   - Query entries: `query_id`, `query`, `positive_passages`, `negative_passages`
3. **Alternative Evaluation**: Can use FlagEmbedding's built-in evaluation:
```python
from FlagEmbedding.evaluation.miracl import MIRACLEvalRunner
runner = MIRACLEvalRunner(eval_args=eval_args, model_args=model_args)
runner.run()
```

## Best Practices

- Use batch processing for embeddings (default batch size: 1024)
- Process search in smaller chunks (32 queries at a time)
- Convert embeddings to float32 for FAISS compatibility
- Cache embeddings when possible for large-scale evaluations