# EoG: Knowledge Graph Question Answering with Reinforcement Learning

This repository contains the implementation for knowledge graph question answering using reinforcement learning techniques, specifically focusing on the EoG (Entity-oriented Graph reasoning) approach.

## 📁 Project Structure

### Root Directory

- **`requirements.txt`**: Python dependencies for the project
- **`reward_func.py`**: Custom reward function implementation for knowledge graph QA tasks
- **`run_rog_cwq.sh`**: Shell script for running ROG-CWQ experiments
- **`test.sh`**: Test script for model evaluation

### 📂 Core Directories

#### `data/` - Data Processing
Contains scripts for processing knowledge graph question-answering datasets:
- **`EoG_process.py`**: Main data processing script for EoG dataset, converts raw data to training format
- **`kg_qa_sft_process.py`**: Processes knowledge graph QA data for supervised fine-tuning (SFT), converts JSONL to multiturn conversation format

#### `sft_data/` - Supervised Fine-tuning Data
Contains preprocessed training data in JSONL format:
- **`2wikimultihop_train_sft.jsonl`**: 2WikiMultihopQA training data
- **`rog_cwq_filtered_train_0.1_with_sft_new_add_list.jsonl`**: ROG-CWQ filtered training data
- **`rog_webqsp_filtered_train_0.1_with_sft_full.jsonl`**: ROG-WebQSP filtered training data
- **`Grailqa_train_full_SFT.jsonl`**: GrailQA training data

#### `test/` - Evaluation Scripts
Contains evaluation and testing scripts:
- **`eog_eval.py`**: Main evaluation script for EoG model, uses LLM API for reasoning evaluation
- **`eog_reasoning_eval.py`**: Comprehensive reasoning evaluation script with multi-process support

#### `train/` - Training Scripts
Contains training configuration scripts:
- **`run_cwq_sft.sh`**: Script for running supervised fine-tuning on CWQ dataset
- **`run_eog.sh`**: Script for running EoG training with GRPO algorithm

#### `recipe/` - Training Recipes
Contains specific training recipes and configurations:
- **`dapo/`**: DAPO (Data-Augmented Policy Optimization) recipe
  - **`main_dapo.py`**: Main entry point for DAPO training
  - **`dapo_ray_trainer.py`**: Ray-based distributed trainer for DAPO
  - **`config/dapo_trainer.yaml`**: DAPO training configuration
  - **`runtime_env.yaml`**: Runtime environment configuration

#### `scripts/` - Utility Scripts
Contains various utility and helper scripts:
- **`converter_hf_to_mcore.py`**: Converts HuggingFace models to Megatron format
- **`diagnose.py`**: Diagnostic tools for debugging
- **`generate_trainer_config.sh`**: Generates trainer configuration files
- **`init_random_model.py`**: Initializes random model weights
- **`install_vllm_sglang_mcore.sh`**: Installation script for vLLM, SGLang, and Megatron
- **`legacy_model_merger.py`**: Merges model checkpoints (legacy version)
- **`print_cfg.py`**: Prints configuration information
- **`rollout_viewer.py`**: Visualizes rollout data

#### `verl/` - Core Framework
The main framework directory containing the VERL (Versatile Reinforcement Learning) library:

- **`base_config.py`**: Base configuration classes
- **`protocol.py`**: Communication protocols

- **`experimental/`**: Experimental features
  - **`agent_loop/`**: Agent loop implementations
  - **`dataset/`**: Experimental dataset utilities
  - **`dynamic_dataset/`**: Dynamic dataset handling

- **`interactions/`**: Interaction modules
  - **`base.py`**: Base interaction class
  - **`gsm8k_interaction.py`**: GSM8K interaction implementation
  - **`utils/`**: Interaction utilities

- **`model_merger/`**: Model merging utilities
  - Model checkpoint merging for FSDP and Megatron backends

- **`models/`**: Model implementations
  - 57 model-related files covering various architectures and utilities

- **`single_controller/`**: Single controller implementations
  - Base classes and Ray-based implementations for distributed training

- **`third_party/`**: Third-party integrations

- **`tools/`**: Tool implementations
  - 14 tool files for various utilities (search, sandbox fusion, etc.)

- **`trainer/`**: Training framework
  - **`config/`**: Configuration files (23 YAML files)
    - PPO trainer configs
    - SFT trainer configs
    - Actor, critic, rollout, and reward model configs
  - **`main_ppo.py`**: Main PPO training entry point
  - **`main_generation.py`**: Generation script
  - **`main_eval.py`**: Evaluation script
  - **`fsdp_sft_trainer.py`**: FSDP-based SFT trainer
  - **`ppo/`**: PPO algorithm implementations

- **`utils/`**: Utility functions
  - 79 utility files covering:
    - Dataset processing
    - Megatron utilities
    - FSDP utilities
    - File system operations
    - Profiling and debugging
    - Reward scoring
    - And more

- **`workers/`**: Worker implementations
  - 54 worker files for:
    - Rollout workers (vLLM, SGLang, HuggingFace)
    - Reward model workers
    - Actor and critic workers
    - Engine implementations (FSDP, Megatron)

#### `tests/` - Test Suite
Comprehensive test suite organized by category:

- **`experimental/`**: Tests for experimental features
- **`interactions/`**: Tests for interaction modules
- **`models/`**: Model tests
- **`single_controller/`**: Single controller tests
- **`special_distributed/`**: Distributed training tests
- **`special_e2e/`**: End-to-end tests
- **`special_npu/`**: NPU-specific tests
- **`special_sanity/`**: Sanity checks and validation
- **`special_standalone/`**: Standalone tests
- **`trainer/`**: Trainer tests
- **`utils/`**: Utility function tests
- **`workers/`**: Worker tests
- **`tools/`**: Tool tests

## 🔧 Key Components

### Data Processing Pipeline
1. **Raw Data** → `data/EoG_process.py` or `data/kg_qa_sft_process.py`
2. **Processed Data** → Stored in `sft_data/` directory
3. **Training** → Uses processed data for SFT or RL training

### Training Pipeline
1. **SFT Training**: `train/run_cwq_sft.sh` → Supervised fine-tuning
2. **RL Training**: `train/run_eog.sh` or `recipe/dapo/` → Reinforcement learning with GRPO/DAPO
3. **Evaluation**: `test/eog_eval.py` or `test/eog_reasoning_eval.py`

### Reward Function
The `reward_func.py` implements custom reward computation for knowledge graph QA tasks, evaluating model outputs against ground truth answers and reasoning paths.

## 🚀 Quick Start

1. **Install Dependencies**:
   ```bash
   pip install -r requirements.txt
   ```

2. **Process Data**:
   ```bash
   python data/EoG_process.py --input_path <path_to_input_data> --output_dir <output_dir>
   ```

3. **Train Model**:
   ```bash
   bash train/run_eog.sh
   ```

4. **Evaluate**:
   ```bash
   python test/eog_eval.py
   ```

## 📝 Notes

- All paths in the codebase use placeholders (e.g., `<path_to_input_data>`) for anonymization
- The framework supports both FSDP and Megatron backends for distributed training
- Multiple rollout engines are supported: HuggingFace, vLLM, and SGLang
- The project uses Ray for distributed computing and Hydra for configuration management

