# H2 Response Generation Log - Qwen/Qwen2.5-7B-Instruct

Generated: 2025-08-22 11:52:30

## Summary

- **Model:** Qwen/Qwen2.5-7B-Instruct → qwen/qwen-2.5-7b-instruct
- **Input dataset:** /research_storage/data/processed/h2_harmbench_twins_test.jsonl
- **Total prompts:** 10
- **Overall success rate:** 100.0% (10 successful)
- **Output file:** /research_storage/outputs/h2/qwen2.5-7b-instruct_h2_test_10_responses.jsonl

## Comprehensive Metrics

### Success Rates by Label
- **Harmful prompts:** 5 total | 5 success | 0 failed | **100.0% success rate**
- **Benign prompts:** 5 total | 5 success | 0 failed | **100.0% success rate**

### Response Generation Metrics
- **Total responses generated:** 50
- **Empty responses encountered:** 0
- **Average response length:** 2422 characters
- **Response length range:** 659 - 4577 characters

### Processing Performance
- **Total processing time:** 378.2 seconds
- **Average time per prompt:** 37.82 seconds
- **Processing rate:** 1.6 prompts/minute

## Generation Parameters

- **n_responses:** 5
- **temperature:** 0.7
- **top_p:** 0.95
- **max_new_tokens:** 1024

