# TorchTitan to HuggingFace Conversion Tools

This directory contains tools to convert TorchTitan distributed checkpoints (DCP format) to HuggingFace transformers format, allowing you to use your trained models with the HuggingFace ecosystem.

## Files

- **`convert_dcp_to_hf.py`** - Main conversion script
- **`test_conversion.py`** - Test script to verify the conversion works
- **`example_convert.sh`** - Example shell script showing how to run the conversion
- **`README.md`** - This file

## Quick Start

1. **Test the conversion** (recommended first step):
   ```bash
   cd convert_to_HF
   python test_conversion.py
   ```

2. **Run the conversion** using the example script:
   ```bash
   ./example_convert.sh
   ```

3. **Or run manually**:
   ```bash
   python convert_dcp_to_hf.py /path/to/dcp/checkpoint /path/to/output/hf/model
   ```

## Understanding Your Checkpoint Structure

Your TorchTitan checkpoint should be a directory containing multiple `.distcp` files from distributed training:

```
step-10000/
├── __0_0.distcp
├── __1_0.distcp
├── __2_0.distcp
├── __3_0.distcp
├── __4_0.distcp
├── __5_0.distcp
├── __6_0.distcp
├── __7_0.distcp
└── .metadata
```

This represents a checkpoint from 8-rank distributed training.

## How the Conversion Works

### 1. Checkpoint Loading
The script uses PyTorch's Distributed Checkpoint (DCP) loader to reconstruct the full model state from the distributed pieces.

### 2. Model Structure Analysis
The conversion script analyzes the TorchTitan checkpoint structure and extracts:
- Model dimensions (vocab size, hidden size, number of layers)
- Attention configuration (number of heads, GQA settings)
- FFN configuration

### 3. Parameter Name Mapping
TorchTitan uses different parameter names than HuggingFace. The conversion maps:

| TorchTitan | HuggingFace |
|------------|-------------|
| `tok_embeddings.weight` | `model.embed_tokens.weight` |
| `layers.{i}.attention.wq.weight` | `model.layers.{i}.self_attn.q_proj.weight` |
| `layers.{i}.attention.wk.weight` | `model.layers.{i}.self_attn.k_proj.weight` |
| `layers.{i}.attention.wv.weight` | `model.layers.{i}.self_attn.v_proj.weight` |
| `layers.{i}.attention.wo.weight` | `model.layers.{i}.self_attn.o_proj.weight` |
| `layers.{i}.feed_forward.w1.weight` | `model.layers.{i}.mlp.gate_proj.weight` |
| `layers.{i}.feed_forward.w2.weight` | `model.layers.{i}.mlp.down_proj.weight` |
| `layers.{i}.feed_forward.w3.weight` | `model.layers.{i}.mlp.up_proj.weight` |
| `layers.{i}.attention_norm.weight` | `model.layers.{i}.input_layernorm.weight` |
| `layers.{i}.ffn_norm.weight` | `model.layers.{i}.post_attention_layernorm.weight` |
| `norm.weight` | `model.norm.weight` |
| `output.weight` | `lm_head.weight` |

### 4. Configuration Generation
The script automatically infers the HuggingFace `LlamaConfig` from the model weights, including:
- Vocabulary size
- Hidden dimensions
- Number of layers and attention heads
- Intermediate (FFN) size
- GQA configuration (if applicable)

## Command Line Usage

```bash
python convert_dcp_to_hf.py [OPTIONS] CHECKPOINT_PATH OUTPUT_PATH
```

### Arguments:
- `CHECKPOINT_PATH` - Path to TorchTitan DCP checkpoint directory
- `OUTPUT_PATH` - Where to save the HuggingFace model

### Options:
- `--config PATH` - Optional: Use existing HuggingFace config.json instead of inferring
- `--push-to-hub` - Push the converted model to HuggingFace Hub
- `--repo-name NAME` - Repository name for Hub upload (required with --push-to-hub)

### Examples:

Basic conversion:
```bash
python convert_dcp_to_hf.py /path/to/step-10000 ./my_hf_model
```

With custom config:
```bash
python convert_dcp_to_hf.py /path/to/step-10000 ./my_hf_model --config custom_config.json
```

Push to Hub:
```bash
python convert_dcp_to_hf.py /path/to/step-10000 ./my_hf_model --push-to-hub --repo-name username/my-model
```

## Using the Converted Model

After conversion, you can load and use your model with HuggingFace transformers:

```python
from transformers import LlamaForCausalLM, LlamaTokenizer

# Load the model
model = LlamaForCausalLM.from_pretrained('./converted_hf_model')

# Load a compatible tokenizer (you'll need to use the same tokenizer from training)
tokenizer = LlamaTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')  # or your training tokenizer

# Generate text
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## Troubleshooting

### Common Issues:

1. **Missing dependencies**:
   ```bash
   pip install transformers torch
   ```

2. **CUDA memory issues**: The conversion loads the full model into memory. If you run out of memory, try:
   - Running on a machine with more RAM
   - Using CPU-only conversion (slower but uses less memory)

3. **Checkpoint path not found**: Ensure you're pointing to the step directory (e.g., `step-10000`), not the parent directory.

4. **Missing `.metadata` file**: This indicates the checkpoint may be corrupted or incomplete.

### Validation:

Always run the test script first to validate your environment:
```bash
python test_conversion.py
```

This will verify:
- All dependencies are installed
- Conversion functions work correctly
- Your checkpoint path exists and is valid
- The conversion pipeline produces valid HuggingFace models

## Notes

- The `freqs_cis` buffer from TorchTitan is not converted as HuggingFace computes RoPE frequencies dynamically
- Some hyperparameters (like `rope_theta`) may need manual adjustment in the generated config
- The script assumes standard Llama architecture - modifications may require code changes
- For models with custom attention implementations, you may need to adjust the conversion logic

## Support

If you encounter issues:
1. First run `python test_conversion.py` to diagnose the problem
2. Check that your checkpoint structure matches the expected format
3. Verify all dependencies are installed correctly 