# Efficient CoT: Chain-of-Thought Training Framework

A comprehensive framework for training and evaluating language models with chain-of-thought reasoning capabilities, featuring RLVR training, evaluation pipelines, and data generation tools.

## 📋 Table of Contents

- [Overview](#overview)
- [Installation](#installation)
- [Verl Training](#l1-training)
- [Evaluation](#evaluation)
- [SFT Finetuning](#sft-finetuning)
- [Data Generation](#data-generation)
- [Directory Structure](#directory-structure)

## 🎯 Overview

This repository contains a complete pipeline for training and evaluating language models with enhanced chain-of-thought reasoning capabilities. It includes:

- **Verl Training**: Reinforcement learning training using SLPO, LCPO, and GRPO algorithms
- **Evaluation**: Comprehensive evaluation framework for model performance assessment
- **SFT Finetuning**: Supervised fine-tuning using Llama Factory
- **Data Generation**: Tools for generating training datasets from GPT-4.1

## 🚀 Installation

### Prerequisites

- Python 3.10+
- CUDA-compatible GPU

### Environment Setup

## 🏋️ RLVR Training

### Installation for RLVR Training

```bash
# Create virtual environment
uv venv
source .venv/bin/activate 
#venv\Scripts\activate
cd l1

# Install verl framework
uv pip install -e verl

# Install additional dependencies
uv pip install packaging
uv pip install ninja
uv pip install flash-attn --no-build-isolation

# Install the project
uv pip install -e .
```

### Training Models

#### Example Training Script

```bash
# Submit training job (modify directories as needed)
sbatch l1/scripts/train/run_glrpo_ray_deepseek_r1_1.5b_SFT_30k_GRPO_range_2048.sbatch
```

#### Reward Managers

- **SLPO and LCPO reward manager**: `l1/verl/verl/trainer/main_ppo_1.py`

#### Dataset Creation

```bash
# Create training dataset
python l1/scripts/data/deepscaler_dataset.py --local_dir <path_name>
```

## 📊 Evaluation

### Installation for Evaluation

```bash
cd evaluation/rllm

# Create virtual environment
uv venv
source .venv/bin/activate

# Install dependencies
pip install -e ./verl
pip install -e .
```

### Evaluation Scripts

#### For Models Trained via Verl

```bash
# Submit evaluation job
sbatch v3/efficient_cot/evaluation/rllm/eval_job_v2.sbatch
```

#### For Models Trained via TRL

```bash
# Submit evaluation job
sbatch v3/efficient_cot/evaluation/rllm/eval_job_v3.sbatch
```

#### Token-wise Statistics

```bash
# Get token-wise statistics
bash evaluation/rllm/utility_files/batch_process_stats.sh
```

## 🎓 SFT Finetuning

### Setup

1. **Install Llama Factory**: Use the latest version and store in the appropriate directory
2. **Update Paths**: Modify the sbatch script paths accordingly
3. **Configure Dataset**: Add data path in `SFT/dataset_info.json`

### Training

```bash
# Example SFT training
sbatch SFT/DeepScaleR_10k/agentica_24k_deepscaler_1.5b/mentalese_cot_lr_1e-6.sbatch
```

## 📈 Data Generation

### GPT-4.1 Dataset Generation

```bash
# Generate datasets from GPT-4.1
python utility_files/data_generation/deepscaleR_40k_generation_deepseek.py
```

For detailed instructions, see the [Data Generation README](utility_files/data_generation/README.md).

## 📁 Directory Structure

```
efficient_cot/
├── l1/                          # L1 training framework
│   ├── scripts/
│   │   ├── train/              # Training scripts
│   │   └── data/               # Data processing scripts
│   └── verl/                   # Verl framework
├── evaluation/
│   └── rllm/                   # Evaluation framework
│       ├── scripts/
│       └── utility_files/
├── SFT/                        # SFT finetuning
│   └── DeepScaleR_10k/
├── utility_files/
│   └── data_generation/        # Data generation tools
└── README.md
```


## 🔧 Configuration

### Environment Variables

Set the following environment variables for proper functionality:

```bash
export VLLM_ATTENTION_BACKEND=XFORMERS
export CUDA_VISIBLE_DEVICES=0,1,2,3  # Adjust based on your GPU setup
```

### Ray Configuration (for Multi-Node Training)

#### Head Node
```bash
export VLLM_ATTENTION_BACKEND=XFORMERS
ray start --head
```

#### Worker Nodes
```bash
export VLLM_ATTENTION_BACKEND=XFORMERS
ray start --address=[RAY_ADDRESS]
```

## 🚨 Troubleshooting

### Common Issues

1. **CUDA Memory Errors**: Reduce batch size or use gradient checkpointing
2. **Ray Connection Issues**: Ensure proper network configuration for multi-node setups
3. **Flash Attention Installation**: Use `--no-build-isolation` flag if encountering build issues

### Performance Tips

- Use XFormers backend for better memory efficiency
- Enable gradient checkpointing for large models
- Use appropriate tensor parallelism for multi-GPU setups
- Monitor GPU memory usage during training

## 📝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

## 📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

## 🤝 Acknowledgments

- Built on top of the [verl](https://github.com/volcengine/verl) framework
- Uses [Llama Factory](https://github.com/hiyouga/LLaMA-Factory) for SFT training
- Evaluation framework based on [rLLM](https://github.com/agentica-project/rllm)

## 📞 Support

For questions and support:
- Create an issue in the repository
- Check the documentation in each subdirectory
- Review the troubleshooting section above
