# Condensed: Faiss Indexes

Summary: This tutorial provides implementation guidance for Faiss, a vector similarity search library, covering four main index types: IndexFlat (exact search), IndexIVF (clustering-based), IndexHNSW (graph-based), and IndexLSH (hash-based). It helps with tasks involving nearest neighbor search in vector spaces, particularly useful for recommendation systems and similarity matching. Key features include installation instructions, index-specific parameter tuning (nlist, nprobe, M, efConstruction), best practices for performance optimization, and basic usage patterns with numpy arrays. The tutorial emphasizes trade-offs between speed, accuracy, and memory usage across different index types, helping developers choose the appropriate index for their specific use case.

*This is a condensed version that preserves essential implementation details and context.*

Here's the condensed tutorial focusing on essential implementation details and key concepts:

# Faiss Indexes Tutorial

## Installation
```python
# CPU version
pip install faiss-cpu

# GPU version (Linux x86_64)
conda install -c pytorch -c nvidia faiss-gpu=1.8.0
```

## Key Index Types

### 1. IndexFlat
- Basic index with no preprocessing/compression
- Guarantees global optimum results but slower
- Best for small datasets where accuracy is critical

```python
# Basic implementation
d = 128  # dimension
k = 3    # nearest neighbors
index = faiss.IndexFlatL2(d)  # L2 distance
index.add(data)
D, I = index.search(query, k)
```

### 2. IndexIVF
- Uses k-means clustering for faster searching
- Key parameters:
  - `nlist`: number of clusters
  - `nprobe`: number of clusters to search
  
```python
nlist = 5
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFFlat(quantizer, d, nlist)
index.train(data)  # Required training step
index.add(data)
index.nprobe = 8   # Set before searching
```

**Best Practice**: Balance `nlist` and `nprobe` for speed/accuracy tradeoff

### 3. IndexHNSW
- Graph-based method with multi-layer structure
- Key parameters:
  - `M`: neighbors per node
  - `efConstruction`: build-time exploration
  - `efSearch`: search-time exploration

```python
index = faiss.IndexHNSWFlat(d, M=32)
index.hnsw.efConstruction = 32
index.hnsw.efSearch = 16
index.add(data)
```

**Warning**: 
- No vector removal support
- Memory-intensive
- Best when RAM isn't limited

### 4. IndexLSH
- Hash-based grouping of similar vectors
- Parameter `nbits` controls resolution
- Best for low-dimensional vectors

```python
index = faiss.IndexLSH(d, nbits=d*8)
index.train(data)
index.add(data)
```

**Warning**: Not recommended for high-dimensional vectors (e.g., transformer embeddings)

## Common Usage Pattern
```python
# Setup
import faiss
import numpy as np
data = np.random.random((1000, 128))

# Search
D, I = index.search(query_vectors, k)
# D: distances
# I: indices of nearest neighbors
```