# RCD: Retrieval-augmented Contextual Decoding

## 🛠️ Environment Setup

Create a conda environment from the provided YAML file:

```bash
conda env create -f rcd.yaml
conda activate rcd
```

## 📁 Repository Structure

```bash
submission/
│ ├── data # downloaded Biographies data
│ ├── prompt_templates
│ ├── utils
├── precompute_space.py # constructing grounding space
├── params.py
├── evaluate.py # supported functions for loading data and evaluation
├── inference_utils.py # Main methods for decoding strategies
├── run.py # Run experiments
├── run.sh # Example script with commands
```

## 🚀 Getting Started

Follow the workflow below to reproduce the results or run your own experiments.

---

## 🔧 Precompute Embedding & Logit Space

```bash
python precompute_space.py --base_model='qwen2.5-3b' --train_data='truthful_qa' --base_embed_model='all-MiniLM-L6-v2' --sample_num=2
python precompute_space.py --base_model='qwen2.5-3b' --train_data='bio' --base_embed_model='all-MiniLM-L6-v2' --sample_num=2
python precompute_space.py --base_model='qwen2.5-3b' --train_data='wiki' --base_embed_model='all-MiniLM-L6-v2' --sample_num=2
```

## 🧪 Run Evaluation (Main results)
Each command evaluates a specific decoding strategy (greedy, dola, instructive, icl, adaptive) on different datasets.

```bash
# TruthfulQA
python run.py --base_model='qwen2.5-3b' --method='greedy' --data_split='test' --max_sample_num=2 --eval_data='truthful_qa' --evaluation_type='gemini' --batch_size=160 # max_sample_num=417
python run.py --base_model='qwen2.5-3b' --method='dola' --data_split='test' --max_sample_num=2 --eval_data='truthful_qa' --evaluation_type='gemini' --batch_size=160
python run.py --base_model='qwen2.5-3b' --method='instructive' --data_split='test' --max_sample_num=2 --eval_data='truthful_qa' --noisy_prompt_key='opposite_zero' --evaluation_type='gemini' --batch_size=160 # ID
python run.py --base_model='qwen2.5-3b' --method='instructive' --data_split='test' --max_sample_num=2 --eval_data='truthful_qa' --noisy_prompt_key='cad_zero' --evaluation_type='gemini' --batch_size=160 # CAD
python run.py --base_model='qwen2.5-3b' --method='icl' --data_split='test' --max_sample_num=2 --eval_data='truthful_qa' --train_data='truthful_qa' --evaluation_type='gemini' --batch_size=160
python run.py --base_model='qwen2.5-3b' --method='adaptive' --data_split='test' --max_sample_num=2 --eval_data='truthful_qa' --train_data='truthful_qa' --evaluation_type='gemini' --batch_size=160  # RCD

# Biogarphies
python run.py --base_model='qwen2.5-3b' --method='greedy' --data_split='test' --max_sample_num=2 --eval_data='bio' --batch_size=32 --evaluation_type='gemini' # max_sample_num=128
python run.py --base_model='qwen2.5-3b' --method='dola' --data_split='test' --max_sample_num=2 --eval_data='bio' --batch_size=32 --evaluation_type='gemini'
python run.py --base_model='qwen2.5-3b' --method='icl' --data_split='test' --max_sample_num=2 --eval_data='bio' --train_data='bio' --standard_prompt_key='few_shot_bio' --batch_size=32 --evaluation_type='gemini'
python run.py --base_model='qwen2.5-3b' --method='instructive' --data_split='test' --max_sample_num=2 --eval_data='bio' --noisy_prompt_key='opposite_bio' --batch_size=32 --evaluation_type='gemini' # ID
python run.py --base_model='qwen2.5-3b' --method='instructive' --data_split='test' --max_sample_num=2 --eval_data='bio' --noisy_prompt_key='cad_bio' --batch_size=32 --evaluation_type='gemini' # CAD
python run.py --base_model='qwen2.5-3b' --method='adaptive' --data_split='test' --max_sample_num=2 --eval_data='bio' --train_data='bio' --batch_size=32 --evaluation_type='gemini' # RCD

# OOD
python run.py --base_model='qwen2.5-3b' --method='adaptive' --data_split='test' --max_sample_num=2 --eval_data='bio' --train_data='' --batch_size=32 --evaluation_type='gemini' # max_sample_num=128
python run.py --base_model='qwen2.5-3b' --method='greedy' --data_split='test' --max_sample_num=2 --eval_data='bio' --standard_prompt_key='few_shot_bio_ood' --batch_size=32 --evaluation_type='gemini'

# Wiki
python run.py --base_model='qwen2.5-3b' --method='greedy' --data_split='test' --max_sample_num=2 --eval_data='wiki' --evaluation_type='gemini' --batch_size=160 # max_sample_num=1000
python run.py --base_model='qwen2.5-3b' --method='dola' --data_split='test' --max_sample_num=2 --eval_data='wiki' --evaluation_type='gemini' --batch_size=160
python run.py --base_model='qwen2.5-3b' --method='instructive' --data_split='test' --max_sample_num=2 --eval_data='wiki' --noisy_prompt_key='opposite_zero' --evaluation_type='gemini' --batch_size=160 # ID
python run.py --base_model='qwen2.5-3b' --method='instructive' --data_split='test' --max_sample_num=2 --eval_data='wiki' --noisy_prompt_key='cad_zero' --evaluation_type='gemini' --batch_size=160 # CAD
python run.py --base_model='qwen2.5-3b' --method='icl' --data_split='test' --max_sample_num=2 --eval_data='wiki' --train_data='wiki' --evaluation_type='gemini' --batch_size=160
python run.py --base_model='qwen2.5-3b' --method='adaptive' --data_split='test' --max_sample_num=2 --eval_data='wiki' --train_data='wiki' --evaluation_type='gemini' --batch_size=160 # RCD
```

All commands are listed in submission/run.sh for batch execution.

## 📌 Notes
* All models are assumed to be run with Qwen2.5-3B.
* Modify `--sample_num` to control the number of evaluation samples.