Summary: This tutorial provides implementation details for Faiss quantization techniques, specifically Scalar Quantization and Product Quantization (PQ) for high-dimensional vector compression and similarity search. It demonstrates how to implement basic quantizers and their corresponding indexes, with code examples for both standalone quantizers and index-based approaches. The tutorial covers key parameters like dimension (d), sub-vectors (M), and bits per quantizer (nbits), helping with tasks such as vector compression, nearest neighbor search, and large-scale similarity search optimization. Notable features include memory-efficient vector storage, fast similarity search capabilities, and IVF clustering for improved search performance on large datasets.

# Faiss Quantizers

In this notebook, we will introduce the quantizer object in Faiss and how to use them.

## Preparation

For CPU usage, run:


```python
%pip install faiss-cpu
```

For GPU on Linux x86_64 system, use Conda:

```conda install -c pytorch -c nvidia faiss-gpu=1.8.0```


```python
import faiss
import numpy as np

np.random.seed(768)

data = np.random.random((1000, 128))
```

## 1. Scalar Quantizer

Normal data type of vector embeedings is usually 32 bit floats. Scalar quantization is transforming the 32 float representation to, for example, 8 bit interger. Thus with a 4x reduction in size. In this way, it can be seen as we distribute each dimension into 256 buckets.

| Name | Class | Parameters |
|:------------:|:--------:|:-----------|
| `ScalarQuantizer` | Quantizer class | `d`: dimension of vectors<br>`qtype`: map dimension into $2^\text{qtype}$ clusters |
| `IndexScalarQuantizer` | Flat index class | `d`: dimension of vectors<br>`qtype`: map dimension into $2^\text{qtype}$ clusters<br>`metric`: similarity metric (L2 or IP) |
| `IndexIVFScalarQuantizer` | IVF index class | `d`: dimension of vectors<br>`nlist`: number of cells/clusters to partition the inverted file space<br>`qtype`: map dimension into $2^\text{qtype}$ clusters<br>`metric`: similarity metric (L2 or IP)

Quantizer class objects are used to compress the data before adding into indexes. Flat index class objects and IVF index class objects can be used direct as and index. Quantization will be done automatically.

### Scalar Quantizer


```python
d = 128
qtype = faiss.ScalarQuantizer.QT_8bit

quantizer = faiss.ScalarQuantizer(d, qtype)

quantizer.train(data)
new_data = quantizer.compute_codes(data)

print(new_data[0])
```

### Scalar Quantizer Index


```python
d = 128
k = 3
qtype = faiss.ScalarQuantizer.QT_8bit
# nlist = 5

index = faiss.IndexScalarQuantizer(d, qtype, faiss.METRIC_L2)
# index = faiss.IndexIVFScalarQuantizer(d, nlist, faiss.ScalarQuantizer.QT_8bit, faiss.METRIC_L2)

index.train(data)
index.add(data)
```


```python
D, I = index.search(data[:1], k)

print(f"closest elements: {I}")
print(f"distance: {D}")
```

## 2. Product Quantizer

When speed and memory are crucial factors in searching, product quantizer becomes a top choice. It is one of the effective quantizer on reducing memory size. 

The first step of PQ is dividing the original vectors with dimension `d` into smaller, low-dimensional sub-vectors with dimension `d/m`. Here `m` is the number of sub-vectors.

Then clustering algorithms are used to create codebook of a fixed number of centroids.

Next, each sub-vector of a vector is replaced by the index of the closest centroid from its corresponding codebook. Now each vector will be stored with only the indices instead of the full vector.

When comuputing the distance between a query vector. Only the distances to the centroids in the codebooks are calculated, thus enable the quick approximate nearest neighbor searches.

| Name | Class | Parameters |
|:------------:|:--------:|:-----------|
| `ProductQuantizer` | Quantizer class | `d`: dimension of vectors<br>`M`: number of sub-vectors that D % M == 0<br>`nbits`: number of bits per subquantizer, so each contain $2^\text{nbits}$ centroids |
| `IndexPQ` | Flat index class | `d`: dimension of vectors<br>`M`: number of sub-vectors that D % M == 0<br>`nbits`: number of bits per subquantizer, so each contain $2^\text{nbits}$ centroids<br>`metric`: similarity metric (L2 or IP) |
| `IndexIVFPQ` | IVF index class | `quantizer`: the quantizer used in computing distance phase.<br>`d`: dimension of vectors<br>`nlist`: number of cells/clusters to partition the inverted file space<br>`M`: number of sub-vectors that D % M == 0<br>`nbits`: number of bits per subquantizer, so each contain $2^\text{nbits}$ centroids<br>`metric`: similarity metric (L2 or IP) |

### Product Quantizer


```python
d = 128
M = 8
nbits = 4

quantizer = faiss.ProductQuantizer(d, M, nbits)

quantizer.train(data)
new_data = quantizer.compute_codes(data)

print(new_data.max())
print(new_data[:2])
```

### Product Quantizer Index


```python
index = faiss.IndexPQ(d, M, nbits, faiss.METRIC_L2)

index.train(data)
index.add(data)
```


```python
D, I = index.search(data[:1], k)

print(f"closest elements: {I}")
print(f"distance: {D}")
```

### Product Quantizer IVF Index


```python
nlist = 5

quantizer = faiss.IndexFlat(d, faiss.METRIC_L2)
index = faiss.IndexIVFPQ(quantizer, d, nlist, M, nbits, faiss.METRIC_L2)

index.train(data)
index.add(data)
```


```python
D, I = index.search(data[:1], k)

print(f"closest elements: {I}")
print(f"distance: {D}")
```
