# MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use

MUA-RL is a reinforcement learning framework for training large language models with multi-turn conversation capabilities and tool usage.

## Installation

### Quick Install

```bash
# Clone the repository
cd MUA-code

# Install dependencies
pip install -e .
pip install -r requirements_sglang.txt
pip install transformers==4.51.1
```

## Quick Start

### 1. Configure Training

Edit the training script parameters:

```bash
# Edit model path and other parameters in the script
vim examples/sglang_multiturn/mua_32b.sh
```

Key parameters to modify:
- `MODEL_PATH`: Path to your base model
- `N_NODE`: Number of nodes for distributed training
- `BATCH_SIZE`: Training batch size
- `EPOCH_NUM`: Number of training epochs
- `API_KEY`: OpenAI API key for evaluation model (e.g., GPT-4o)
- `BASE_URL`: OpenAI API base URL
- `CKPT_DIR`: Directory path to save model checkpoints
- `TENSORBOARD_DIR`: Directory path for TensorBoard logs
- `ROLLOUT_LOG_PATH`: Directory path for rollout generation logs
- `VALID_LOG_PATH`: Directory path for validation logs

### 2. Run Training

#### Multi-Node Training (4 * 8 GPUs)

```bash
# For 4*8 GPU setup,, suggest H200 141GB
bash examples/sglang_multiturn/mua_32b.sh
```

### 3. Convert Checkpoints to Hugging Face Format

After training, convert distributed checkpoints to Hugging Face format:

```bash
# Edit the merge script configuration
vim scripts/merge.sh

# Set your model path and name
BASE_DIR="/path/to/your/checkpoints/"
MODEL_NAME="your_model_name"

# Run the conversion
bash scripts/merge.sh
```

The script will automatically:
- Find all `global_step_*` directories
- Convert FSDP/Megatron checkpoints to Hugging Face format
- Save merged models to `iter_XXXXXX/actor/unify_checkpoint/`

## License

Apache License 2.0
