# Condensed: Simple RAG From Scratch

Summary: This tutorial demonstrates a practical implementation of a Retrieval-Augmented Generation (RAG) system using BGE embeddings, Faiss vector storage, and GPT-4o-mini. It covers essential techniques for building a restaurant recommendation system, including embedding generation with BGE, efficient similarity search with Faiss IndexFlatIP, and structured prompt engineering. Key functionalities include data preparation, vector indexing, semantic search, and response generation. The tutorial is particularly useful for tasks involving semantic search, content retrieval, and recommendation systems, with specific focus on optimizing embedding efficiency through FP16 and implementing proper query instructions for retrieval.

*This is a condensed version that preserves essential implementation details and context.*

Here's the condensed tutorial focusing on essential implementation details:

# Simple RAG Implementation with BGE, Faiss, and GPT-4o-mini

## Key Components & Setup
```python
%pip install -U numpy faiss-cpu FlagEmbedding openai
```

## Implementation Steps

### 1. Data Preparation
```python
corpus = [
    "Cheli: A downtown Chinese restaurant...",
    "Masa: Midtown Japanese restaurant...",
    # ... more restaurant entries
]
user_input = "I want some Chinese food"
```

### 2. Indexing
```python
# Initialize embedding model
from FlagEmbedding import FlagModel
model = FlagModel('BAAI/bge-base-en-v1.5',
                  query_instruction_for_retrieval="Represent this sentence for searching relevant passages:",
                  use_fp16=True)

# Create embeddings
embeddings = model.encode(corpus, convert_to_numpy=True)

# Create Faiss index
import faiss
import numpy as np
index = faiss.IndexFlatIP(embeddings.shape[1])
index.add(embeddings)
```

### 3. Retrieval & Generation
```python
# Query embedding and retrieval
q_embedding = model.encode_queries([user_input], convert_to_numpy=True)
D, I = index.search(q_embedding, 3)
res = np.array(corpus)[I]

# Prompt template
prompt = """
You are a bot that makes recommendations for restaurants. 
Please be brief, answer in short sentences without extra information.

These are the restaurants list:
{recommended_activities}

The user's preference is: {user_input}
Provide the user with 2 recommended restaurants based on the user's preference.
"""

# OpenAI API call
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt.format(user_input=user_input, 
                                                 recommended_activities=res)}
    ]
).choices[0].message
```

## Critical Configurations
- BGE model: `bge-base-en-v1.5`
- Faiss index type: `IndexFlatIP`
- Top-k retrieval: 3 documents
- LLM: `gpt-4o-mini`

## Best Practices
1. Use `use_fp16=True` for efficient embedding
2. Include specific query instructions for retrieval
3. Keep prompt instructions clear and concise
4. Retrieve appropriate number of documents (3 in this case) for context