# RevAm Supplementary Materials

## Overview

This repository presents RevAm, a RL-based trajectory optimization framework designed to evaluate and bypass defense mechanisms in diffusion models. The framework employs reinforcement learning principles combined with large language model (LLM) feedback to develop sophisticated attack strategies that can adaptively overcome various types of content filtering defenses.

## Project Structure

### Core Components

#### `intelligent_attack/`
The main attack framework containing:

- **`core.py`**: Implements the `IntelligentAdaptiveAttacker` class with policy network and reinforcement learning algorithms
- **`detectors.py`**: Factory pattern implementation for various content detection systems
- **`attack_methods.py`**: Core attack algorithms including direction-based and magnitude-based defense bypass techniques
- **`evaluation.py`**: Evaluation manager for LLM-based attack assessment
- **`experiment.py`**: Experiment management and result tracking
- **`llm_feedback.py`**: LLM integration for adaptive feedback and strategy refinement
- **`style_detectors.py`**: Specialized detectors for artistic style recognition (Van Gogh, Picasso)
- **`category_clip_detectors.py`**: CLIP-based category detection capabilities

#### `asr_evaluation/`
Automated Success Rate calculation and evaluation:

- **`asr_calculator.py`**: Main ASR computation engine for batch and individual experiment evaluation
- **`nude_detector.py`**: NudeNet-based content detection for nudity evaluation
- **`metrics.py`**: Statistical analysis and reporting utilities

#### `q16_evaluation/`
Q16 violence detection framework:

- **`q16_detector.py`**: Violence detection using Q16 classifier
- **`q16_classifier.py`**: CLIP-based violence classification implementation

#### `datasets/`
Curated datasets for different attack scenarios:

- **`nudity.csv`**: Nudity-related prompts with ground truth annotations
- **`violence.csv`**: Violence-related prompts with multi-dimensional toxicity scores
- **`VanGogh.csv`**: Van Gogh style prompts for artistic style attack evaluation
- **`PabloPicasso.csv`**: Picasso style prompts for artistic style attack evaluation
- **Additional datasets**: Entity, relationship, and abstraction datasets for comprehensive evaluation

### Main Execution

#### `run_intelligent_attack.py`
Primary execution script that orchestrates the entire attack pipeline:

- Configures attack parameters for different content types
- Manages experiment execution and result collection
- Generates comprehensive reports with success metrics
- Supports multiple attack types: nudity, violence, artistic styles, entity, abstraction, relationship, and celebrity.

## Usage

### Configuration

Modify the configuration parameters in `run_intelligent_attack.py`:

```python
CURRENT_ATTACK_TYPE = "nude"  # Options: "nude", "violence", "vangogh", "pablo_picasso", ...
DEFENSE_WEIGHTS_PATH = ""     # Path to defense model weights
MODEL_ID = "black-forest-labs/FLUX.1-dev"
LLM_PROVIDER = "openrouter"   # LLM service provider
LLM_API_KEY = "xx"
LLM_MODEL = "xx"
LLM_BASE_URL = "xx"
```

### Execution

```bash
python run_intelligent_attack.py
```

### Results

The framework generates:

- **Individual experiment reports**: Detailed results for each prompt
- **Batch evaluation reports**: Aggregate statistics across all experiments
- **Visual comparisons**: Baseline vs. attack images for qualitative assessment
- **ASR calculations**: Quantitative success rate metrics

## Dependencies

- PyTorch
- Diffusers
- Transformers
- PIL/Pillow
- NumPy
- Pandas
- CLIP models
- NudeNet
- Various LLM API clients

## Research Applications

This framework is designed for:

- **Defense Mechanism Evaluation**: Assessing the robustness of content filtering systems
- **Adversarial Robustness Research**: Understanding vulnerabilities in diffusion models
- **Content Safety Research**: Developing more robust safety mechanisms
- **AI Safety Evaluation**: Comprehensive testing of AI system defenses

## Ethical Considerations

This research tool is intended for academic and research purposes to improve the robustness and safety of AI systems. Users should ensure compliance with applicable laws and ethical guidelines when conducting research with this framework.

---

*This framework represents a comprehensive approach to evaluating and understanding defense mechanisms in diffusion models, contributing to the broader goal of developing more robust and safe AI systems.*
