# ECHO: Efficient Clipped Hierarchical Optimization for LLM Alignment

This repository contains the official implementation of **ECHO**, a policy optimization method for aligning large language models (LLMs) via reinforcement learning. All algorithms are implemented on top of the **VerL** framework. The core algorithms (ECHO, GSPO, GRPO, DAPO) are contained in `core_algos.py`.

## Datasets

- **Training data**: OpenR1-Math-46k-8192  
  - [Huggingface link](https://huggingface.co/datasets/Elliott/Openr1-Math-46k-8192)
- **Evaluation data**: 9 validation sets including AIME24/25, Math500, AMC, Olympiad, Minverva, ARC, GPQA, MMLU-Pro  
- **Evaluation protocol**: unified evaluation using `math_verify`

## Core Algorithms

- **ECHO**: Efficient Clipped Hierarchical Optimization  
- **GSPO**: Generalized Sequence Policy Optimization  
- **GRPO & DAPO**: Generalized Reward Policy Optimization  

All policy update implementations are contained in `core_algos.py`. Users do not need to modify these for standard training runs.

## Training Scripts

- `Qwen2.5_Math_7B_OpenR1_echo.sh` — ECHO training  
- `Qwen2.5_Math_7B_OpenR1_gspo.sh` — GSPO training  
- `Qwen2.5_Math_7B_OpenR1_grpo.sh` — GRPO training  
- `Qwen2.5_Math_7B_OpenR1_dapo.sh` — DAPO training  

> **Note**: All training scripts and core algorithms are implemented on top of the **VerL** framework.
