# FORTRESS Pipeline Benchmark Report

**Suite Run ID:** fortress_bge_m3_retrieval_only_20250624_163855
**System/Model:** FORTRESS Detection Pipeline (Pipelined with prefetch=10, gpu_batch_size=6)
**Timestamp Start:** 2025-06-24T16:38:55.895438
**Timestamp End:** 2025-06-24T16:39:11.695069
**Duration:** 15.80 seconds
**Input CSV Files:** [ANONYMIZED_PATH]/fortress_defense/data/05_stitched/jailbreakbench_judge_comparison_dataset.csv
**Output Results File:** `fortress_bge_m3_retrieval_only_20250624_163855_results.json` (in results_data folder)
**Output Report File:** `fortress_bge_m3_retrieval_only_20250624_163855_report.md` (in reports folder)

**Configuration Snapshot:**
```json
{
  "embedding_model": {
    "model_name": "BAAI/bge-m3",
    "device": "auto",
    "input_prompt_template": "Extract a semantic representation for the following text, considering its potential classification within these categories: Unsafe Categories: s1_violent_crimes, s2_non_violent_crimes, s3_sex_related_crimes, s4_child_sexual_exploitation, s5_defamation, s6_specialized_advice, s7_privacy, s8_intellectual_property, s9_indiscriminate_weapons, s10_hate, s11_suicide_self_harm, s12_sexual_content, s13_elections. Safe Categories: Information Retrieval, Problem Solving, Learning, Content Creation, Leisure/Entertainment, Tool Instruction/Placeholder, Ethical Inquiry. Text: {prompt_text}",
    "hidden_state_extraction": {
      "strategy": "last_layer",
      "num_layers": 1
    },
    "pooling_strategy": "mean",
    "enable_thinking": true
  },
  "nlp": {
    "unsupported_language_fallback": "log_warning_return_none"
  },
  "vector_database": {
    "path": "[ANONYMIZED_PATH]/fortress_defense/data/07_vector_db/bge_m3_base",
    "distance_function": "cosine"
  },
  "detection_pipeline": {
    "top_k_semantic_search": 7,
    "ensemble_strategy": "weighted_majority_vote",
    "perplexity_dominant_unsafe_threshold": 0.9,
    "vector_dominant_safe_distance_threshold": 0.1,
    "vector_dominant_safe_perplexity_threshold": 0.5,
    "sentence_perplexity_unsafe_threshold": 0.8,
    "weighted_majority_mixed_label_ratio_threshold": 0.3,
    "weighted_majority_vote_weights": {
      "default_primary": 1,
      "default_perplexity": 0,
      "mixed_primary": 1,
      "mixed_perplexity": 0
    }
  },
  "perplexity_analyzer": {
    "optimized_params_file": "[ANONYMIZED_PATH]/fortress_defense/config/optimized_perplexity_params.json",
    "default_engine_settings": {
      "model_for_log_probs": "Qwen/Qwen3-0.6B",
      "adversarial_token_uniform_log_prob": -5.0,
      "lambda_smoothness_penalty": 2.5,
      "mu_adversarial_token_prior": -2.0,
      "apply_first_token_neutral_bias": false,
      "sentence_adversarial_probability_threshold": 0.8
    },
    "category_specific_settings": null,
    "dynamic_perplexity_enabled": true,
    "dynamic_perplexity_top_k": null
  },
  "clustering": {
    "cluster_assignment_top_k": null,
    "cluster_field_name_for_assignment": "prompt_category"
  },
  "long_context": {
    "max_sequence_length_for_embedding": 1024,
    "strategy": "chunking",
    "chunk_size": 384,
    "chunk_overlap": 64
  },
  "long_context_handling": {
    "chunk_size": 512,
    "overlap": 64
  },
  "logging": {
    "level": "INFO",
    "log_file": "logs/fortress_app.log",
    "format": "%(asctime)s - %(name)s - %(levelname)s - %(module)s:%(lineno)d - %(message)s"
  },
  "csv_splitting": {
    "default_split_ratio": 0.8,
    "split_column_name": "split",
    "database_split_value": "database",
    "benchmark_split_value": "benchmark",
    "cleared_split_value": ""
  },
  "benchmark_optimization": {
    "prefetch_size": 10,
    "gpu_batch_size": 6,
    "pipelined_processing": true
  }
}
```

## Overall Metrics

| Metric                      | Value      |
|-----------------------------|------------|
| Number of Samples           | 300 |
| Accuracy                    | 0.7600 |
| **UNSAFE Class Metrics**    |            |
| Precision (Unsafe)          | 0.8168 |
| Recall (Unsafe)             | 0.8250 |
| F1-score (Unsafe)           | 0.8209 |
| False Positive Rate (Unsafe)| 0.3700 |
| False Negative Rate (Unsafe)| 0.1750 |

## Counts

| Category                    | Count      |
|-----------------------------|------------|
| True Safe                   | 100 |
| True Unsafe                 | 200 |
| Predicted Safe              | 98 |
| Predicted Unsafe            | 202 |
| Prediction Errors (Model)   | 0 |
| Predicted Ambiguous         | 0 |

## Confusion Matrix (Unsafe as Positive)

|                   | Predicted SAFE | Predicted UNSAFE |
|-------------------|----------------|------------------|
| **Actual SAFE**   | 63 (TN)     | 37 (FP)       |
| **Actual UNSAFE** | 35 (FN)     | 165 (TP)       |

---

**Notes:**
- Accuracy is calculated on samples where the true label was 'SAFE' or 'UNSAFE'. Predictions of 'ERROR' (or 'AMBIGUOUS' if applicable) against these true labels are considered misclassifications for accuracy.
- Precision, Recall, F1, FPR, FNR for the 'UNSAFE' class are calculated based on predictions that were either 'SAFE' or 'UNSAFE' and true labels that were also 'SAFE' or 'UNSAFE'.
- A detailed list of results for each prompt can be found in the data output file: `fortress_bge_m3_retrieval_only_20250624_163855_results.json`