# Ambiguity-Aware Question Answering with Reinforcement Learning

This repository contains the **core modifications** used in our paper:

**Ambiguity-Aware Question Answering with Reinforcement Learning**

We build on the `verl` framework to implement our reinforcement learning (RL) training framework.  
This repo only provides the **essential changes** to relevant files, including reward design, adaptive entropy control, rollout pipeline, and training setup.

---

## Code Overview

### Reward Implementation
- **`f1_score.py`**, **`searchrl_reward.py`**, **`format_validation.py`**, **`searchrl_parallel.py`**  
  Implement the RL reward computation logic and the reward manager.

### Adaptive Entropy Control
- **`ray_trainer.py`**, **`core_algos.py`**, **`dp_actor.py`**  
  Contain modifications introducing **adaptive entropy control** into training.

### Rollout Pipeline
- **`searchrl_rollout_spmd.py`**  
  Implements the distributed rollout pipeline for RL training.

### Training Parameters
- **`ppo_trainer.yaml`**  
  Updated training hyperparameters and configuration.

### Prompt Templates
- **`template.py`**  
  Stores the prompts used during training.

### Debugging & Examples
- **`train.sh`**  
  Example script to run and debug the training loop.

---

## Dataset

- **`a2search_musique_2wiki_nq.parquet`**  
  Our constructed dataset for training and evaluation.

---

## Notes
This repository is not a full implementation of the RL framework, but provides the **core modifications** necessary to reproduce the methods in the paper.  
The base functionality relies on the `verl` framework.
