# IFSandbox

A sandbox for instruction fine-tuning, focusing on enhancing instruction quality and diversity for language model training.

## System Architecture


## Features

- Instruction collection and quality control
- Instruction enhancement with difficulty control
- Automated verification system
- Integration with VERL for model training
- Configurable API endpoints for model inference

## Installation

1. Clone the repository:
```bash
git clone [your-repo-url]
cd ifsandbox
```

2. Install in development mode:
```bash
pip install -e .
```

3. Environment Requirements:
- Python >= 3.6
- PyTorch
- OpenAI API (for some features)
- Other dependencies will be installed automatically via setup.py

## Project Structure

```
ifsandbox/
├── modules/       # Core modules for instruction processing
├── infer/         # Inference and API integration code
├── data/         # Data directory for instruction datasets
├── scripts/      # Utility scripts
├── logs/         # Log files
└── outputs/      # Output files and results
```

## Configuration

The project uses a centralized configuration system through `global_config.py`. Key configurations include:

- Server endpoints for model inference
- Project paths and directories
- Model configurations

Example usage:
```python
from global_config import PROJECT_ROOT, PROJECT_DATA
```

## Core Modules

### 1. Instruction Collection and Quality Control
- **Instruction Processing**: Structures and processes raw instruction data
- **Instag Module**: Performs instruction selection and classification
- **Selection Strategy**: Configurable strategies for quality instruction selection

### 2. Instruction Enhancement
- **Difficulty Control**: Uses benchmark models to assess and adjust instruction difficulty
- **Enhancement Module**: Implements evol-instruct methodology with constraint addition
- **Metadata Management**: Tracks and manages instruction metadata for verification

### 3. Verification System
- Rule-based verification
- LLM-based quality checks
- Constraint validation

### 4. Training Integration
- VERL integration for model training
- Performance monitoring
- Result analysis

## Usage Examples

See the `demo/` directory for detailed examples and tutorials.

## Contributing

Contributions are welcome! Please read our contributing guidelines before submitting pull requests.

## License

[Your License]

## Contact

[Your Contact Information]



index : 

ifsandbox/data/v1_seed/random_selected_data_350k.jsonl
