Summary: This tutorial provides implementation guidance for Faiss GPU, focusing on vector similarity search acceleration using GPUs. It demonstrates three key implementation approaches: single GPU, all available GPUs, and specific multiple GPUs configuration. The tutorial covers essential code patterns for index creation, GPU resource management, and data transfer between CPU and GPU. Key functionalities include proper installation, basic index setup, GPU resource allocation (with 18% VRAM allocation), and result verification. This guide is particularly useful for tasks involving large-scale vector similarity search and nearest neighbor computations that need GPU acceleration.

# Faiss GPU

In the last tutorial, we went through the basics of indexing using faiss-cpu. While for the use cases in research and industry. The size of dataset for indexing will be extremely large, the frequency of searching might also be very high. In this tutorial we'll see how to combine Faiss and GPU almost seamlessly.

## 1. Installation

Faiss maintain the latest updates on conda. And its gpu version only supports Linux x86_64

create a conda virtual environment and run:

```conda install -c pytorch -c nvidia faiss-gpu=1.8.0```

make sure you select that conda env as the kernel for this notebook. After installation, restart the kernal.

If your system does not satisfy the requirement, install faiss-cpu and just skip the steps with gpu related codes.

## 2. Data Preparation

First let's create two datasets with "fake embeddings" of corpus and queries:


```python
import faiss
import numpy as np

dim = 768
corpus_size = 1000
# np.random.seed(111)

corpus = np.random.random((corpus_size, dim)).astype('float32')
```

## 3. Create Index on CPU

### Option 1:

Faiss provides a great amount of choices of indexes by initializing directly:


```python
# first build a flat index (on CPU)
index = faiss.IndexFlatIP(dim)
```

### Option 2:

Besides the basic index class, we can also use the index_factory function to produce composite Faiss index.


```python
index = faiss.index_factory(dim, "Flat", faiss.METRIC_L2)
```

## 4. Build GPU Index and Search

All the GPU indexes are built with `StandardGpuResources` object. It contains all the needed resources for each GPU in use. By default it will allocate 18% of the total VRAM as a temporary scratch space.

The `GpuClonerOptions` and `GpuMultipleClonerOptions` objects are optional when creating index from cpu to gpu. They are used to adjust the way the GPUs stores the objects.

### Single GPU:


```python
# use a single GPU
rs = faiss.StandardGpuResources()
co = faiss.GpuClonerOptions()

# then make it to gpu index
index_gpu = faiss.index_cpu_to_gpu(provider=rs, device=0, index=index, options=co)
```


```python
%%time
index_gpu.add(corpus)
D, I = index_gpu.search(corpus, 4)
```

### All Available GPUs

If your system contains multiple GPUs, Faiss provides the option to deploy al available GPUs. You can control their usages through `GpuMultipleClonerOptions`, e.g. whether to shard or replicate the index acrross GPUs.


```python
# cloner options for multiple GPUs
co = faiss.GpuMultipleClonerOptions()

index_gpu = faiss.index_cpu_to_all_gpus(index=index, co=co)
```


```python
%%time
index_gpu.add(corpus)
D, I = index_gpu.search(corpus, 4)
```

### Multiple GPUs

There's also option that use multiple GPUs but not all:


```python
ngpu = 4
resources = [faiss.StandardGpuResources() for _ in range(ngpu)]
```

Create vectors for the GpuResources and divices, then pass them to the index_cpu_to_gpu_multiple() function.


```python
vres = faiss.GpuResourcesVector()
vdev = faiss.Int32Vector()
for i, res in zip(range(ngpu), resources):
    vdev.push_back(i)
    vres.push_back(res)
index_gpu = faiss.index_cpu_to_gpu_multiple(vres, vdev, index)
```


```python
%%time
index_gpu.add(corpus)
D, I = index_gpu.search(corpus, 4)
```

## 5. Results

All the three approaches should lead to identical result. Now let's do a quick sanity check:


```python
# The nearest neighbor of each vector in the corpus is itself
assert np.all(corpus[:] == corpus[I[:, 0]])
```

And the corresponding distance should be 0.


```python
print(D[:3])
```
