# EVADE-Bench Project Code Documentation

This project is the code implementation of the EVADE-Bench benchmark, primarily used for evaluating the performance of large language models in text and image tasks. The project contains multiple functional modules covering data processing, model testing, image processing, and various other aspects.

## Project Structure

```
code/
├── image_fast_test_accuracy.py      # Image task fast test accuracy evaluation
├── pipeline1_data_filter.py         # Data filtering pipeline
├── text_allinone_rag_fast_test_accuracy.py  # Text RAG all-in-one testing
├── text_fast_test_accuracy.py       # Text task fast test accuracy evaluation
├── trans_pdf_rgb_to_cmyk.py         # PDF RGB to CMYK conversion tool
├── trans_rgb_to_cmyk.py             # Image RGB to CMYK conversion tool
└── utils.py                         # General utility function library
```

## Detailed Function Descriptions

### 1. `image_fast_test_accuracy.py` - Image Task Fast Test Accuracy Evaluation

**Function Description:**
- Specifically designed for testing the performance of large language models in image understanding tasks
- Supports multi-threaded parallel processing to improve testing efficiency
- Uses GPT-4o-0806 model for image analysis

**Key Features:**
- Multi-threaded concurrent processing (default 50 threads)
- Automatic retry mechanism (up to 100 times)
- Real-time test result saving
- Supports image base64 encoding processing
- Automatic accuracy calculation and statistics

**Use Cases:**
- Evaluate model's understanding ability of image content
- Batch testing of image-related risk identification tasks
- Generate detailed test reports and accuracy statistics

### 2. `pipeline1_data_filter.py` - Data Filtering Pipeline

**Function Description:**
- Implements a complete dataset filtering pipeline
- Supports deduplication, clustering, and sampling of text and image data
- Uses machine learning methods for data quality optimization

**Processing Pipeline:**
1. **ID Deduplication Stage**: Remove duplicate data entries
2. **Clustering and Sampling Stage**:
   - Text data: Use TF-IDF vectorization for clustering
   - Image data: Use ResNet18 feature extraction for clustering
3. **Model Validation Stage**: Simulate multi-model predictions to filter data with disagreements

**Key Features:**
- Supports both text and image modalities
- Automatic image download and processing
- Parallel processing for improved efficiency
- Intelligent clustering and sampling algorithms
- Automatic cleanup of temporary files

### 3. `text_allinone_rag_fast_test_accuracy.py` - Text RAG All-in-One Testing

**Function Description:**
- Implements Retrieval-Augmented Generation (RAG) text testing
- Combines few-shot learning with document retrieval
- Uses Qwen2.5-7b-instruct model

**Core Functions:**
- Automatically split dataset into document set and query set
- TF-IDF based similar document retrieval
- Construct one-shot example dialogue format
- Multi-threaded parallel processing of test data

**Technical Features:**
- Supports 100 concurrent threads
- Intelligent document retrieval algorithm
- Automatic few-shot prompt construction
- Real-time result saving and merging

### 4. `text_fast_test_accuracy.py` - Text Task Fast Test Accuracy Evaluation

**Function Description:**
- Specifically designed for pure text task performance testing
- Supports evaluation of multiple large language models
- Efficient multi-threaded processing architecture

**Main Functions:**
- Batch text task processing
- Multi-threaded concurrent execution (50 threads)
- Automatic retry and error handling
- Accuracy calculation and statistics

**Applicable Scenarios:**
- Text classification task evaluation
- Risk identification model testing
- Large-scale text dataset processing

### 5. `trans_pdf_rgb_to_cmyk.py` - PDF RGB to CMYK Conversion Tool

**Function Description:**
- Converts PDF files from RGB color space to CMYK color space
- Supports multi-page PDF processing
- Suitable for printing industry requirements

**Processing Flow:**
1. Use PyMuPDF to extract PDF pages
2. Convert pages to images
3. Use PIL for CMYK conversion
4. Regenerate PDF in CMYK format

**Technical Features:**
- Supports multi-page PDF processing
- Automatic temporary file management
- High-quality image conversion
- Suitable for printing standards

### 6. `trans_rgb_to_cmyk.py` - Image RGB to CMYK Conversion Tool

**Function Description:**
- Converts image files from RGB format to CMYK format
- Supports multiple image formats
- Provides detailed conversion information

**Main Functions:**
- Image format detection and conversion
- CMYK color space conversion
- Conversion result verification
- High-quality output saving

**Supported Formats:**
- Input: PNG, JPG, TIFF and other common formats
- Output: CMYK format JPEG files

### 7. `utils.py` - General Utility Function Library

**Function Description:**
- Provides various utility functions required by the project
- Contains core functions such as data processing, API calls, accuracy calculation

**Main Modules:**

#### Data Processing Functions
- `compute_token_diff()`: Calculate text differences
- `uniform_format_of_options()`: Uniform option formatting
- `extract_json()`: JSON string extraction
- `validate_and_extract_box_content()`: Extract box content

#### Accuracy Calculation Functions
- `calculate_accuracy_by_two_classify()`: Binary classification accuracy calculation
- `calculate_accuracy_by_multi_classify()`: Multi-classification accuracy calculation
- `get_inference_result_and_check_accuracy()`: Inference result accuracy checking

#### API Call Functions
- `call_idealab_api()`: Call Idealab API
- `call_qwen3_api()`: Call Qwen3 API
- `messages_builder_example()`: Build message format

#### Image Processing Functions
- `encode_image_to_base64()`: Image base64 encoding
- `concat_base64_image_url()`: Build image URL

#### RAG Related Functions
- `split_rag_dataset()`: Split RAG dataset
- `retrieve_similar_document()`: Retrieve similar documents
- `messages_builder_example_one_shot_text()`: Build one-shot text messages

## Environment Requirements
python==3.9.9
### Python Dependencies
```
openai
datasets
pandas
numpy
matplotlib
PIL
torch
torchvision
scikit-learn
requests
PyMuPDF
reportlab
```

### Environment Variables
The following environment variables need to be set:
- `api_key`: API key
- `base_url`: API base URL

## Usage Instructions

### 1. Install Dependencies
```bash
pip install -r requirements.txt
```

### 2. Set Environment Variables
```bash
export api_key="your_api_key"
export base_url="your_base_url"
```

### 3. Run Tests
```bash
# Image testing
python image_fast_test_accuracy.py

# Text testing
python text_fast_test_accuracy.py

# RAG testing
python text_allinone_rag_fast_test_accuracy.py

# Data filtering
python pipeline1_data_filter.py
```

## Output Description

### Test Results
- All test results are saved in the `../datas/` directory
- Filenames include timestamps to distinguish tests from different times
- Results include detailed accuracy statistics

### Log Output
- Real-time processing progress display
- Error information and retry records
- Final accuracy statistics

## Important Notes

1. **API Limits**: Pay attention to API call frequency limits, the program has built-in retry mechanisms
2. **Memory Usage**: Multi-threaded processing may consume more memory, please adjust the number of threads according to system configuration
3. **File Paths**: Ensure output directories exist and have write permissions
4. **Image Processing**: Image conversion tools require sufficient disk space to store temporary files

## Contributing Guidelines

Welcome to submit Issues and Pull Requests to improve the project. Before submitting code, please ensure:
1. Code complies with project standards
2. Add necessary comments and documentation
3. Test whether new features work properly

## License

This project uses the MIT license, see LICENSE file for details. 