# When should I search more: Adaptive Complex Query Optimization with Reinforcement Learning

## 📖 Overview

**ACQO (Adaptive Complex Query Optimization)** is a novel reinforcement learning framework designed to optimize complex queries in Retrieval-Augmented Generation (RAG) systems. Unlike existing approaches that focus on single query expansion, ACQO adaptively handles complex real-world queries requiring multiple parallel and sequential search strategies.

### Framework Architecture

<div align="center">
  <img src="assets/Overview.png" alt="ACQO Framework Overview" width="800"/>
  <p><em>Figure 1: ACQO Framework Architecture showing the Adaptive Query Reformulation (AQR) module, Rank-Score Fusion (RSF) module, and Curriculum Reinforcement Learning (CRL) training strategy.</em></p>
</div>

### Key Features

- **🔄 Adaptive Query Reformulation (AQR)**: Dynamically decides when to decompose queries into sub-queries
- **🔀 Rank-Score Fusion (RSF)**: Robust result aggregation with stable reward signals
- **📚 Curriculum Reinforcement Learning (CRL)**: Two-stage training strategy for improved stability
- **⚡ High Efficiency**: Improved computational efficiency with broad retrieval architecture compatibility
- **🎯 State-of-the-Art**: Superior performance on three complex query benchmarks

## 🏗️ Repository Structure

```
ACQO/
├── data/                          # Dataset storage and preprocessing
├── evaluation/                    # Evaluation scripts and metrics
├── local_index_search/           # Local search index utilities
├── recipe/                       # Training configurations and recipes
│   ├── dapo/                     # DAPO algorithm configurations
│   ├── drgrpo/                   # DRG-RPO algorithm configurations  
│   ├── prime/                    # PRIME algorithm configurations
│   ├── r1/                       # R1 algorithm configurations
│   └── sppo/                     # SPPO algorithm configurations
├── scripts/                      # Training and execution scripts
├── search_launch/                # Retrieval service launch scripts
│   ├── ance/                     # ANCE retrieval service
│   └── lucene/                   # Lucene retrieval service
├── src/                          # Core source code
├── test_data/                    # Test datasets and examples
└── verl/                         # VERL framework components
    ├── models/                   # Model implementations
    │   ├── llama/                # LLaMA model support
    │   ├── mcore/                # Megatron core integration
    │   ├── qwen2/                # Qwen2 model support
    │   └── transformers/         # Transformers integration
    ├── single_controller/        # Single controller implementations
    ├── third_party/              # Third-party integrations
    │   ├── sglang/               # SGLang integration
    │   └── vllm/                 # vLLM integration
    ├── tools/                    # Utility tools
    ├── trainer/                  # Training framework
    │   ├── config/               # Training configurations
    │   └── ppo/                  # PPO trainer implementation
    ├── utils/                    # Utility functions
    │   ├── checkpoint/           # Checkpoint management
    │   ├── dataset/              # Dataset utilities
    │   ├── debug/                # Debugging tools
    │   ├── logger/               # Logging utilities
    │   ├── megatron/             # Megatron utilities
    │   ├── metric/               # Evaluation metrics
    │   ├── rendezvous/           # Distributed training utilities
    │   └── reward_score/         # Reward scoring functions
    ├── version/                  # Version management
    └── workers/                  # Distributed worker implementations
        ├── actor/                # Actor workers
        ├── critic/               # Critic workers
        ├── reward_manager/       # Reward management
        ├── reward_model/         # Reward model workers
        ├── rollout/              # Rollout workers
        └── sharding_manager/     # Sharding management
```

## 🚀 Quick Start

### Prerequisites

- Python 3.8+
- CUDA 11.8+ (for GPU support)
- 16GB+ RAM recommended
- Multiple GPUs recommended for distributed training

### Installation

1. **Clone the repository**
```bash
git clone https://github.com/your-org/ACQO.git
cd ACQO
```

2. **Create virtual environment**
```bash
conda create -n acqo python=3.8
conda activate acqo
```

3. **Install dependencies**
```bash
pip install -r requirements.txt
```

4. **Install the package**
```bash
pip install -e .
```

### Data Preparation

1. **Download datasets**
```bash
# Download and prepare evaluation datasets
```

2. **Build search indices**
```bash
# Build local search indices
```

## 🎯 Usage

### Training

#### Stage 1: Curriculum Learning - Simple Queries
```bash
# Train on simple queries first
python scripts/train_stage1.py \
    --config recipe/r1/config/stage1.yaml \
    --output_dir outputs/stage1 \
    --num_gpus 4
```

#### Stage 2: Curriculum Learning - Complex Queries
```bash
# Train on complex queries
python scripts/train_stage2.py \
    --config recipe/r1/config/stage2.yaml \
    --checkpoint outputs/stage1/best_model.pt \
    --output_dir outputs/stage2 \
    --num_gpus 4
```

### Retrieval Service Setup

#### Launch Lucene Service
```bash
cd search_launch/lucene
python launch_lucene_server.py \
    --port 8080 \
    --index_path ../../data/indices/lucene
```

#### Launch ANCE Service
```bash
cd search_launch/ance
python launch_ance_server.py \
    --port 8081 \
    --model_path ../../models/ance \
    --index_path ../../data/indices/ance
```

#### Benchmark Evaluation
```bash
# Evaluate on all benchmarks
bash eval.sh
```




## 📊 Experimental Results

### Performance on TopiOCQA Benchmark

#### Sparse Retrieval (BM25)
| Method | MRR@3 | NDCG@3 | R@10 | R@100 |
|--------|-------|--------|------|-------|
| DeepSeek-V3.1 | 15.5 | 17.0 | 36.7 | 65.3 |
| IterCQR (T5-base) | 16.5 | 14.9 | 29.3 | 54.1 |
| ADACQR (T5-base+LLaMA7B) | 28.3 | 26.5 | 48.9 | 71.2 |
| LLM4CS-RAR (ChatGPT) | 27.9 | 26.4 | 48.4 | 71.1 |
| CHIQ-Fusion (T5-base+LLaMA2-7B) | 25.6 | 23.5 | 44.7 | - |
| RETPO (LLaMA2-7B) | 28.3 | 26.5 | 48.3 | 73.1 |
| AdaQR (T5-base) | 20.3 | 18.0 | 37.1 | 66.2 |
| ConvSearch-R1 (Qwen2.5-3B) | **37.8** | 36.2 | 59.6 | 80.1 |
| **ACQO (Ours, Qwen2.5-3B)** | 34.9 | **37.7** | **62.6** | **83.2** |

#### Dense Retrieval (ANCE)
| Method | MRR@3 | NDCG@3 | R@10 | R@100 |
|--------|-------|--------|------|-------|
| DeepSeek-V3.1 | 28.4 | 30.8 | 56.3 | 77.8 |
| IterCQR (T5-base) | 26.3 | 25.1 | 42.6 | 62.0 |
| ADACQR (T5-base+LLaMA7B) | 38.5 | 37.6 | 58.4 | 75.0 |
| LLM4CS-RAR (ChatGPT) | 35.4 | 34.4 | 55.2 | 72.2 |
| CHIQ-Fusion (T5-base+LLaMA2-7B) | 38.0 | 37.0 | 61.6 | - |
| RETPO (LLaMA2-7B) | 30.0 | 28.9 | 49.6 | 68.7 |
| AdaQR (T5-base) | 38.1 | 36.6 | 61.3 | 79.9 |
| ConvSearch-R1 (Qwen2.5-3B) | **50.5** | **50.1** | **72.0** | **86.3** |
| **ACQO (Ours, Qwen2.5-3B)** | 36.6 | 39.4 | 65.6 | 85.1 |

**Key Features:**
- Competitive performance across multiple retrieval architectures


## 🔧 Configuration

### Training Configuration Example

```yaml
# recipe/r1/config/stage1.yaml
model:
  name: "Qwen2.5-3b"
  max_length: 512
  
training:
  batch_size: 16
  learning_rate: 1e-5
  num_epochs: 10
  warmup_steps: 1000
  
aqr_module:
  max_subqueries: 5
  decomposition_threshold: 0.7
  
rsf_module:
  fusion_method: "weighted_sum"
  score_normalization: true
  
curriculum:
  stage1_epochs: 5
  complexity_threshold: 0.5
```

### Retrieval Service Configuration

```yaml
# search_launch/lucene/config.yaml
lucene:
  index_path: "data/indices/lucene"
  analyzer: "standard"
  similarity: "bm25"
  
server:
  host: "0.0.0.0"
  port: 8080
  workers: 4
```


## 🙏 Acknowledgments

- Thanks to the open-source community for the foundational tools and libraries
- Special thanks to the VERL framework contributors
- Inspired by recent advances in retrieval-augmented generation and reinforcement learning


## 🔗 Related Work

- [VERL Framework](https://github.com/volcengine/verl)
- [Retrieval-Augmented Generation](https://arxiv.org/abs/2005.11401)
- [Dense Passage Retrieval](https://arxiv.org/abs/2004.04906)
- [ColBERT](https://arxiv.org/abs/2004.12832)

---
