# Condensed: Choosing Index

Summary: This tutorial provides implementation details for different FAISS index types and their optimal use cases in vector similarity search. It covers code implementations for Flat, IVF, HNSW, Scalar Quantizer, and Product Quantizer indexes, including specific parameter configurations and initialization syntax. The tutorial helps with tasks like choosing and implementing the right index type based on dataset size, memory constraints, and speed requirements. Key features include index-specific parameter tuning, performance trade-offs between speed/memory/accuracy, and a recall evaluation function, making it valuable for building efficient vector search systems.

*This is a condensed version that preserves essential implementation details and context.*

Here's the condensed tutorial focusing on essential implementation details and key concepts:

# Choosing FAISS Indexes - Implementation Guide

## Setup & Prerequisites

```python
# CPU Installation
# pip install -U faiss-cpu numpy h5py

# GPU Installation (Linux x86_64)
# conda install -c pytorch -c nvidia faiss-gpu=1.8.0

import faiss
import numpy as np
import h5py
```

## Key Index Implementations & Performance Characteristics

### 1. Flat Index (Baseline)
- Brute force search with 100% recall
- Used as ground truth reference

```python
index = faiss.IndexFlatL2(d)
index.add(corpus)
```

### 2. IVF Index
- Good balance of speed and recall
- Key parameters:
  - nlist: number of clusters (e.g., 5)
  - nprobe: number of clusters to search (e.g., 3)

```python
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFFlat(quantizer, d, nlist)
index.nprobe = nprobe
index.train(corpus)
index.add(corpus)
```

### 3. HNSW Index
- Extremely fast search
- Higher memory usage
- Critical parameters:
  - M: connections per layer (e.g., 64)
  - ef_search: search depth (e.g., 32)
  - ef_construction: build quality (e.g., 64)

```python
index = faiss.IndexHNSWFlat(d, M)
index.hnsw.efConstruction = ef_construction
index.hnsw.efSearch = ef_search
index.add(corpus)
```

### 4. Scalar Quantizer Index
- Good for integer-range data
- Configuration options:
  - QT_8bit quantization type
  - METRIC_L2 distance metric

```python
index = faiss.IndexScalarQuantizer(d, 
                                  faiss.ScalarQuantizer.QT_8bit,
                                  faiss.METRIC_L2)
index.train(corpus)
index.add(corpus)
```

### 5. Product Quantizer Index
- Balanced trade-off between speed/memory/accuracy
- Key parameters:
  - M: number of subquantizers (e.g., 16)
  - nbits: bits per subquantizer (e.g., 8)

```python
index = faiss.IndexPQ(d, M, nbits, faiss.METRIC_L2)
index.train(corpus)
index.add(corpus)
```

## Best Practices & Selection Guidelines

1. Use Flat Index when:
   - Small dataset
   - Accuracy is critical
   - Memory/speed not constrained

2. Use HNSW when:
   - Search speed is critical
   - Memory constraints are flexible
   - High recall needed

3. Use IVF when:
   - Balanced performance needed
   - Medium-sized datasets
   - Moderate memory constraints

4. Use Product Quantizer when:
   - Large datasets
   - Memory is constrained
   - Moderate recall is acceptable

## Helper Function for Evaluation
```python
def compute_recall(res, truth):
    recall = 0
    for i in range(len(res)):
        intersect = np.intersect1d(res[i], truth[i])
        recall += len(intersect) / len(res[i])
    return recall / len(res)
```