<h1 align="center">
InstructRAG 
</h1>

<h3 align="center">
Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales <br>
[<a href="https://anonymous.4open.science/api/repo/InstructRAG-ICLR2025-Submission/file/files/paper.pdf">Anonymous PDF</a>]
<br> <span style="color:red;">(Please refresh the website if the PDF does not appear properly)</span>
</h3>

InstructRAG is a simple yet effective RAG framework that allows LMs to explicitly denoise retrieved contents by generating rationales for better verifiability and trustworthiness. 

![](files/instructrag.png)

## **InstructRAG Key Features:**

- 🤖 **Self-Synthesis**: Leverage instruction-tuned LMs to generate their OWN supervision for denoising.
- 🔌 **Easy-to-Use**: Support both in-context learning (ICL) and supervised fine-tuning (SFT).
- 🚀 **Effectiveness**: Up to 8.3% better results across 5 benchmarks ([Table 3](https://anonymous.4open.science/api/repo/InstructRAG-ICLR2025-Submission/file/files/paper.pdf)).
- 💪 **Noise Robustness**: Robust to increased noise ratios in various scenarios ([Figure 3](https://anonymous.4open.science/api/repo/InstructRAG-ICLR2025-Submission/file/files/paper.pdf)).
- 🔁 **Task Transferability**: InstructRAG can also solve out-of-domain unseen tasks ([Figure 4](https://anonymous.4open.science/api/repo/InstructRAG-ICLR2025-Submission/file/files/paper.pdf)).

Please see also our [paper](https://anonymous.4open.science/api/repo/InstructRAG-ICLR2025-Submission/file/files/paper.pdf) for more details.

## 🔗 Quick Links
- [Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales](#instructrag-key-features)
    - [Installation](#installation)
    - [Training Script](#training-script)
    - [Evaluation](#evaluation)


## Installation
Run the following script to create a Python virtual environment and install all required packages.
```shell
bash setup.sh
```

Alternatively, you can also directly create a conda environment using the provided configuration file.

```shell
conda env create -f environment.yml
```

## Training Script
To train the model (i.e., InstructRAG-FT), just activate the environment and run the following training script. The training config is set for 4xH100 80G GPUs. You may need to adjust NUM_DEVICE and PER_DEVICE_BATCH_SIZE based on your computation environment.

```shell
conda activate instrag
bash train.sh
```
## Evaluation
There are two instantiations of our framework:
- InstructRAG-Prompting: training-free & easy-to-adapt
- InstructRAG-FT: trainable & better performance

Use the following script to evaluate InstructRAG in both training-free and trainable settings. You can specify the task and model by adjusting DATASET and MODEL in `eval.sh`.

```shell
conda activate instrag
bash eval.sh
```

