# Data Processing Module

This module contains utilities for processing and preparing data for training and evaluation.

## Files

### build_dpo_data.py
Builds DPO (Direct Preference Optimization) training data from evaluation results:
- Loads evaluation results from multiple JSONL files
- Creates preference pairs based on template performance
- Filters data based on training set IDs
- Supports multiple risk types and content modes

### cluster_and_sample.py
Implements clustering and sampling methods for data processing:
- Groups data by similarity or other criteria
- Provides sampling strategies for balanced datasets
- Handles large-scale data processing efficiently

## Usage

The data processing module helps prepare high-quality training data for template selection and evaluation models.