Summary: This tutorial demonstrates a practical implementation of a Retrieval-Augmented Generation (RAG) system using BGE embeddings, Faiss vector storage, and GPT-4o-mini. It covers essential techniques for building a restaurant recommendation system, including embedding generation with BGE, efficient similarity search with Faiss IndexFlatIP, and structured prompt engineering. Key functionalities include data preparation, vector indexing, semantic search, and response generation. The tutorial is particularly useful for tasks involving semantic search, content retrieval, and recommendation systems, with specific focus on optimizing embedding efficiency through FP16 and implementing proper query instructions for retrieval.

# Simple RAG From Scratch

In this tutorial, we will use BGE, Faiss, and OpenAI's GPT-4o-mini to build a simple RAG system from scratch.

## 0. Preparation

Install the required packages in the environment:


```python
%pip install -U numpy faiss-cpu FlagEmbedding openai
```

## 1. Data

Suppose I'm a resident of New York Manhattan, and I want the AI bot to provide suggestion on where should I go for dinner. It's not reliable to let it recommend some random restaurant. So let's provide a bunch of our favorate restaurants.


```python
corpus = [
    "Cheli: A downtown Chinese restaurant presents a distinctive dining experience with authentic and sophisticated flavors of Shanghai cuisine. Avg cost: $40-50",
    "Masa: Midtown Japanese restaurant with exquisite sushi and omakase experiences crafted by renowned chef Masayoshi Takayama. The restaurant offers a luxurious dining atmosphere with a focus on the freshest ingredients and exceptional culinary artistry. Avg cost: $500-600",
    "Per Se: A midtown restaurant features daily nine-course tasting menu and a nine-course vegetable tasting menu using classic French technique and the finest quality ingredients available. Avg cost: $300-400",
    "Ortomare: A casual, earthy Italian restaurant locates uptown, offering wood-fired pizza, delicious pasta, wine & spirits & outdoor seating. Avg cost: $30-50",
    "Banh: Relaxed, narrow restaurant in uptown, offering Vietnamese cuisine & sandwiches, famous for its pho and Vietnam sandwich. Avg cost: $20-30",
    "Living Thai: An uptown typical Thai cuisine with different kinds of curry, Tom Yum, fried rice, Thai ice tea, etc. Avg cost: $20-30",
    "Chick-fil-A: A Fast food restaurant with great chicken sandwich, fried chicken, fries, and salad, which can be found everywhere in New York. Avg cost: 10-20",
    "Joe's Pizza: Most famous New York pizza locates midtown, serving different flavors including classic pepperoni, cheese, spinach, and also innovative pizza. Avg cost: $15-25",
    "Red Lobster: In midtown, Red Lobster is a lively chain restaurant serving American seafood standards amid New England-themed decor, with fair price lobsters, shrips and crabs. Avg cost: $30-50",
    "Bourbon Steak: It accomplishes all the traditions expected from a steakhouse, offering the finest cuts of premium beef and seafood complimented by wine and spirits program. Avg cost: $100-150",
    "Da Long Yi: Locates in downtown, Da Long Yi is a Chinese Szechuan spicy hotpot restaurant that serves good quality meats. Avg cost: $30-50",
    "Mitr Thai: An exquisite midtown Thai restaurant with traditional dishes as well as creative dishes, with a wonderful bar serving cocktails. Avg cost: $40-60",
    "Yichiran Ramen: Famous Japenese ramen restaurant in both midtown and downtown, serving ramen that can be designed by customers themselves. Avg cost: $20-40",
    "BCD Tofu House: Located in midtown, it's famous for its comforting and flavorful soondubu jjigae (soft tofu stew) and a variety of authentic Korean dishes. Avg cost: $30-50",
]

user_input = "I want some Chinese food"
```

## 2. Indexing

Now we need to figure out a fast but powerful enough method to retrieve docs in the corpus that are most closely related to our questions. Indexing is a good choice for us.

The first step is embed each document into a vector. We use bge-base-en-v1.5 as our embedding model.


```python
from FlagEmbedding import FlagModel

model = FlagModel('BAAI/bge-base-en-v1.5',
                  query_instruction_for_retrieval="Represent this sentence for searching relevant passages:",
                  use_fp16=True)

embeddings = model.encode(corpus, convert_to_numpy=True)
```


```python
embeddings.shape
```

Then, let's create a Faiss index and add all the vectors into it.

If you want to know more about Faiss, refer to the tutorial of [Faiss and indexing](https://github.com/FlagOpen/FlagEmbedding/tree/master/Tutorials/3_Indexing).


```python
import faiss
import numpy as np

index = faiss.IndexFlatIP(embeddings.shape[1])

index.add(embeddings)
```


```python
index.ntotal
```

## 3. Retrieve and Generate

Now we come to the most exciting part. Let's first embed our query and retrieve 3 most relevant document from it:


```python
q_embedding = model.encode_queries([user_input], convert_to_numpy=True)

D, I = index.search(q_embedding, 3)
res = np.array(corpus)[I]

res
```

Then set up the prompt for the chatbot:


```python
prompt="""
You are a bot that makes recommendations for restaurants. 
Please be brief, answer in short sentences without extra information.

These are the restaurants list:
{recommended_activities}

The user's preference is: {user_input}
Provide the user with 2 recommended restaurants based on the user's preference.
"""
```

Fill in your OpenAI API key below:


```python
import os

os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY"
```

Finally let's see how the chatbot give us the answer!


```python
from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {
            "role": "user",
            "content": prompt.format(user_input=user_input, recommended_activities=res)
        }
    ]
).choices[0].message
```


```python
print(response.content)
```
