# Agent Cognitive Attack Framework

A research framework for analyzing LLM agent cognitive vulnerabilities using Multi-Armed Bandit (MAB) experiments and cognitive modeling.

## 🎯 Quick Start (3 Steps)

### 1. Configure Environment
```bash
# Copy and edit environment variables
cp .env.example .env
# Edit .env with your API keys

# Verify configuration
python scripts/config/config_manager.py check
```

### 2. Run Experiments
```bash
# Quick test (mock mode)
python scripts/experiment/run_experiments.py --mock --max_instructions 2 --trials 3

# Full experiment (using JSON config)
python scripts/experiment/run_experiments.py
```

### 3. Analyze Results
```bash
# Complete analysis (metrics + drift + radar)
python scripts/analysis/analyze.py --folder logs/jailbreak/DeepSeek-R1/ --type all

# Fit cognitive parameters
python scripts/analysis/fit_params.py
```

---

## 📁 New Directory Structure

```
agent-cognitive-attack/
├── 📄 Root Files
│   ├── README.md              # This file
│   ├── STRUCTURE.md           # Detailed structure guide
│   ├── CLAUDE.md              # Developer guide
│   ├── .env                   # API keys (gitignored)
│   └── .env.example           # Template
│
├── 📂 src/                    # Core modules
│   ├── core/                  # Configuration & utilities
│   ├── mab/                   # MAB experiment engine
│   ├── fitting/               # Cognitive parameter fitting
│   └── analysis/              # Metrics & visualization
│
├── 📂 scripts/                # Executable scripts
│   ├── experiment/            # Run experiments
│   │   ├── run_experiments.py # Batch experiments (RECOMMENDED)
│   │   ├── run_pipeline.py    # Full workflow
│   │   └── MAB_safety.py      # Legacy (backward compatible)
│   │
│   ├── analysis/              # Analyze results
│   │   ├── analyze.py         # Unified analysis
│   │   ├── fit_params.py      # Parameter fitting
│   │   └── bias_parameter_fitter.py  # Legacy
│   │
│   └── config/                # Configuration
│       └── config_manager.py  # CLI config tool
│
├── 📂 data/                   # Datasets
│   └── AdvBench/              # Harmful instructions
│
├── 📂 logs/                   # Results output
│   ├── jailbreak/             # Experiment CSVs
│   ├── analysis/              # Fitted parameters
│   └── images/                # Visualizations
│
├── 📂 tests/                  # Tests
│   └── test_new_architecture.py
│
└── 📂 docs/                   # Documentation
    ├── guides/                # Usage guides
    └── reports/               # Project reports
```

---

## 🚀 Core Workflows

### Workflow 1: Batch Experiments (Recommended)
```bash
# 1. Edit experiments.json
#    - Enable/disable models
#    - Set trials_per_instruction: 50
#    - Set mock_mode: false

# 2. Run experiments
python scripts/experiment/run_experiments.py

# 3. Analyze results
python scripts/analysis/analyze.py --folder logs/jailbreak/DeepSeek-R1/ --type all

# 4. Fit parameters
python scripts/analysis/fit_params.py
```

### Workflow 2: Single Model Experiment
```bash
# Direct command line
python scripts/experiment/run_mab.py \
    --source siliconflow \
    --model deepseek-ai/DeepSeek-R1 \
    --instruction "Your prompt" \
    --trials 50
```

### Workflow 3: Complete Pipeline
```bash
# One command: experiment + analysis
python scripts/experiment/run_pipeline.py \
    --source siliconflow \
    --model deepseek-ai/DeepSeek-R1 \
    --trials 10 \
    --mock
```

---

## 📊 Configuration

### experiments.json
```json
{
  "models": [
    {
      "source": "siliconflow",
      "model": "deepseek-ai/DeepSeek-R1",
      "mock": false,
      "max_workers": 2,
      "enabled": true
    }
  ],
  "runtime": {
    "trials_per_instruction": 50,
    "max_workers": 8,
    "mock_mode": false
  }
}
```

### Supported Platforms
- SiliconFlow
- Ollama
- DashScope
- DMXAPI
- OpenAI
- GPTS
- NVIDIA (NEW)

---

## 📊 Output Files

### Experiment Results (CSV)
- `trial`, `scenario_id`, `group`
- `action` (Compliance/Refusal)
- `reward`, `parsed_label`

### Cognitive Report (CSV)
- `alpha_pos`, `alpha_neg` (learning rates)
- `rho` (risk preference)
- `theta` (static bias)
- `lambda` (inertia)
- `phi` (memory decay)
- `beta` (choice sensitivity)
- `nll`, `bic` (fit quality)

### Visualizations
- `drift.png` - Temporal dynamics
- `radar.png` - Cognitive fingerprint

---

## 🔧 Configuration Management

```bash
# View current config
python scripts/config/config_manager.py show

# Check completeness
python scripts/config/config_manager.py check

# List platforms
python scripts/config/config_manager.py platforms

# Initialize config
python scripts/config/config_manager.py init
```

---

## 📚 Documentation

### Quick Guides
- **QUICK_START.md** → `docs/guides/QUICK_START.md`
- **EXPERIMENTS_CONFIG.md** → `docs/guides/EXPERIMENTS_CONFIG.md`
- **CONFIGURATION.md** → `docs/guides/CONFIGURATION.md`

### Project Reports
- **PROJECT_STATUS.md** → `docs/reports/PROJECT_STATUS.md`
- **README_UPDATE.md** → `docs/reports/README_UPDATE.md`
- **STRUCTURE.md** → Root directory

### Developer Guide
- **CLAUDE.md** → Complete development guide

---

## ⚡ Common Commands

```bash
# 1. Verify setup
python scripts/config/config_manager.py check

# 2. Quick test
python scripts/experiment/run_experiments.py --mock --max_instructions 2 --trials 3

# 3. Full analysis
python scripts/analysis/analyze.py --folder logs/jailbreak/DeepSeek-R1/ --type all

# 4. Parameter fitting
python scripts/analysis/fit_params.py

# 5. Run tests
python tests/test_new_architecture.py
```

---

## 🎯 Key Features

### 1. Modular Architecture
- Clean separation of concerns
- Easy to extend and maintain
- Type hints throughout

### 2. Configuration Management
- JSON-based configuration
- Environment variable support
- CLI configuration tool

### 3. Batch Processing
- Parallel execution with ThreadPoolExecutor
- Per-model concurrency control
- Mock mode for testing

### 4. Unified Analysis
- Metrics calculation (ASR, NTF, JRS)
- Temporal drift visualization
- Cognitive fingerprint radar charts

### 5. Backward Compatibility
- Original scripts still work
- Gradual migration path
- No breaking changes

---

## 🎓 Core Concepts

### Experiment Flow
```
User Instruction → MAB Environment → LLM Agent → Behavior Trajectory
                      ↓
              7 Scenario Groups × N Trials
                      ↓
              CSV Output (Raw Data)
                      ↓
              Cognitive Fitting (NLL)
                      ↓
              Parameters + Visualizations
```

### Cognitive Parameters
- **α+ / α-**: Learning rates (greed/fear)
- **ρ**: Risk preference
- **θ**: Static bias (safety)
- **λ**: Behavioral inertia
- **φ**: Memory decay
- **β**: Choice sensitivity

---

## 🔄 Migration from Legacy

If you have old code:
1. **Backed up**: All old files in `docs/legacy/_legacy_backup/`
2. **Compatible**: Original scripts still work
3. **Recommended**: Use new modular structure

---

## 📞 Next Steps

1. Read `docs/guides/QUICK_START.md` for detailed examples
2. Edit `scripts/config/experiments.json` for your experiments
3. Run `python scripts/experiment/run_experiments.py --mock` to test
4. Check `STRUCTURE.md` for complete directory layout

---

**Last Updated**: 2025-12-29
**Version**: 2.0 (Modular Architecture)
**Status**: ✅ Production Ready
