# SmartRAG: Jointly Learn RAG-Related Tasks From the Environment Feedback

RAG systems consist of multiple modules to work together. However, these modules are usually separately trained. We argue that a system like RAG that incorporates multiple modules should be jointly optimized to achieve optimal performance. To demonstrate this, we design a specific pipeline called SmartRAG that include 1) a decision maker that decides when to retrieve, 2) a query rewriter to generate a query most suited to the retriever, 3) a retriever that returns an observation based on the input query and 4) a answer generator that produces the final result with/without the observations. We then propose to jointly optimize the whole system using a reinforcement learning algorithm, with the reward designed to encourage the system to achieve the highest answer performance with minimal retrieval cost. When jointly optimized, each module can be aware of how other modules are working and thus find the best way to work together as a complete system. Empirical results demonstrate that the jointly optimized system can achieve better performance than separately optimized counterparts.

![](framework.png)

# Install

```bash
git clone https://github.com/allenai/RL4LMs.git
cd RL4LMs
pip install -e .
```

# Quick Start - Train PPO using pre-defined YAML configs

We provide a simple training API that can be invoked via train script that allows to train PPO by using a config file (YAML). 

For example, to train T5-large on PPO, you can run:

```bash
python scripts/training/train_text_generation.py --config_path train_scripts/t5_large_ppo.yml
```

# Evaluation

We provide code for evaluating the trained model, such as testing the Exact Match and F1 Score on the PopQA dataset:

```bash
python SmartRAG/evaluation/evaluate.py --base_model_path flan-t5-large --checkpoint save_checkpoint --dataset popqa
```