Summary: This tutorial demonstrates the implementation of a Retrieval-Augmented Generation (RAG) system using LangChain, covering PDF document processing, text chunking, vector embeddings, and query pipeline setup. It provides specific code for creating a document retrieval system using FAISS vector storage, HuggingFace embeddings (specifically bge-base-en-v1.5 model), and customizable text splitting parameters. The tutorial helps with tasks like document Q&A, information retrieval, and building RAG applications, featuring key functionalities such as PDF loading, text chunking with configurable overlap, vector database creation and persistence, and structured prompt templating for response generation.

# RAG with LangChain

LangChain is well adopted by open-source community because of its diverse functionality and clean API usage. In this tutorial we will show how to use LangChain to build an RAG pipeline.

## 0. Preparation

First, install all the required packages:


```python
%pip install pypdf langchain langchain-openai langchain-huggingface
```

Then fill the OpenAI API key below:


```python
# For openai key
import os
os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY"
```

BGE-M3 is a very powerful embedding model, We would like to know what does that 'M3' stands for.

Let's first ask GPT the question:


```python
from langchain_openai.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-4o-mini")

response = llm.invoke("What does M3-Embedding stands for?")
print(response.content)
```

By quickly checking the GitHub [repo](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/BGE_M3) of BGE-M3. Since BGE-M3 paper is not in its training dataset, GPT is not capable to give us correct answer.

Now, let's use the [paper](https://arxiv.org/pdf/2402.03216) of BGE-M3 to build an RAG application to answer our question precisely.

## 1. Data

The first step is to load the pdf of our paper:


```python
from langchain_community.document_loaders import PyPDFLoader

# Or download the paper and put a path to the local file instead
loader = PyPDFLoader("https://arxiv.org/pdf/2402.03216")
docs = loader.load()
```


```python
print(docs[0].metadata)
```

The whole paper contains 18 pages. That's a huge amount of information. Thus we split the paper into chunks to construct a corpus.


```python
from langchain.text_splitter import RecursiveCharacterTextSplitter

# initialize a splitter
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,    # Maximum size of chunks to return
    chunk_overlap=150,  # number of overlap characters between chunks
)

# use the splitter to split our paper
corpus = splitter.split_documents(docs)
```

## 2. Indexing

Indexing is one of the most important part in RAG. LangChain provides APIs for embedding models and vector databases that make things simple and straightforward.

Here, we choose bge-base-en-v1.5 to embed all the chunks to vectors, and use Faiss as our vector database.


```python
from langchain_huggingface.embeddings import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5", 
encode_kwargs={"normalize_embeddings": True})
```

Then create a Faiss vector database given our corpus and embedding model. 

If you want to know more about Faiss, refer to the tutorial of [Faiss and indexing](https://github.com/FlagOpen/FlagEmbedding/tree/master/Tutorials/3_Indexing).


```python
from langchain.vectorstores import FAISS

vectordb = FAISS.from_documents(corpus, embedding_model)

# (optional) save the vector database to a local directory
vectordb.save_local("vectorstore.db")
```


```python
# Create retriever for later use
retriever = vectordb.as_retriever()
```

## 3. Retreive and Generate

Let's write a simple prompt template. Modify the contents to match your different use cases.


```python
from langchain_core.prompts import ChatPromptTemplate

template = """
You are a Q&A chat bot.
Use the given context only, answer the question.

<context>
{context}
</context>

Question: {input}
"""

# Create a prompt template
prompt = ChatPromptTemplate.from_template(template)
```

Now everything is ready. Assemble them to a chain and let the magic happen!


```python
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

doc_chain = create_stuff_documents_chain(llm, prompt)
chain = create_retrieval_chain(retriever, doc_chain)
```

Run the following cell, we can see that the chatbot can answer the question correctly!


```python
response = chain.invoke({"input": "What does M3-Embedding stands for?"})

# print the answer only
print(response['answer'])
```
