Summary: This tutorial demonstrates how to implement information retrieval evaluation using Sentence Transformers, specifically focusing on the BeIR Quora dataset. It covers techniques for loading and preprocessing evaluation datasets, creating efficient corpus subsets, and establishing query-document relevance mappings. The tutorial helps with tasks like setting up IR evaluation pipelines, managing large document collections, and computing retrieval metrics. Key functionalities include working with the SentenceTransformer model, utilizing the InformationRetrievalEvaluator, and handling dataset relationships through dictionary mappings and set operations.

# Evaluation Using Sentence Transformers

In this tutorial, we will go through how to use the Sentence Tranformers library to do evaluation.

## 0. Installation


```python
%pip install -U sentence-transformers
```


```python
from sentence_transformers import SentenceTransformer

# Load a model
model = SentenceTransformer('all-MiniLM-L6-v2')
```

## 1. Retrieval

Let's choose retrieval as the first task


```python
import random

from sentence_transformers.evaluation import InformationRetrievalEvaluator

from datasets import load_dataset
```

BeIR is a well known benchmark for retrieval. Let's use the xxx dataset for our evaluation.


```python
# Load the Quora IR dataset (https://huggingface.co/datasets/BeIR/quora, https://huggingface.co/datasets/BeIR/quora-qrels)
corpus = load_dataset("BeIR/quora", "corpus", split="corpus")
queries = load_dataset("BeIR/quora", "queries", split="queries")
relevant_docs_data = load_dataset("BeIR/quora-qrels", split="validation")
```


```python
# Shrink the corpus size heavily to only the relevant documents + 10,000 random documents
required_corpus_ids = list(map(str, relevant_docs_data["corpus-id"]))
required_corpus_ids += random.sample(corpus["_id"], k=10_000)
corpus = corpus.filter(lambda x: x["_id"] in required_corpus_ids)

# Convert the datasets to dictionaries
corpus = dict(zip(corpus["_id"], corpus["text"]))  # Our corpus (cid => document)
queries = dict(zip(queries["_id"], queries["text"]))  # Our queries (qid => question)
relevant_docs = {}  # Query ID to relevant documents (qid => set([relevant_cids])
for qid, corpus_ids in zip(relevant_docs_data["query-id"], relevant_docs_data["corpus-id"]):
    qid = str(qid)
    corpus_ids = str(corpus_ids)
    if qid not in relevant_docs:
        relevant_docs[qid] = set()
    relevant_docs[qid].add(corpus_ids)
```

Finally we are ready to do the evaluation.


```python
# Given queries, a corpus and a mapping with relevant documents, the InformationRetrievalEvaluator computes different IR metrics.
ir_evaluator = InformationRetrievalEvaluator(
    queries=queries,
    corpus=corpus,
    relevant_docs=relevant_docs,
    name="BeIR-quora-dev",
)

results = ir_evaluator(model)
```
