# FORTRESS (Fast, Orchestated, Training-free Retrevial Ensemble for Scalable Security)
## Warning:
This project is a research prototype and is not intended for production use. It is provided "as is" without any warranties or guarantees. While the code is designed to be robust, it may not handle all edge cases or errors that could occur in a production environment. Use at your own risk.


## Prerequisites
This project uses poetry for dependency management. To set up the environment, make sure you have poetry installed, then run:

```bash
poetry install
```
Also make sure the system has a cuda-enabled GPU and the NVIDIA driver installed. 

## Folder Structure
The project is organized into several directories:

`benchmarks/`: 
1. Contains data for the results of baselines and different configuration of the FORTRESS system, including markdowns overview reports in `benchmarks/reports/{benchmark_name}` and more detailed json results in `benchmarks/results_data/{benchmark_name}`. 
2. This directory also includes the results of the noise robustness and scalability benchmarks.

`configs/`: 
1. Contains configuration files for the FORTRESS system, with specific configurations for FORTRESS stored in `config/experiment_catalogue`, the files used in the system is `/config/settings.yaml` and will be swapped out for specific experiment config at runtime. 
2. This directory also includes the perplexity thresholds for the FORTRESS system
3. `config/constants.py` stores the base project directory path among other definitions, which is used throughout the project to ensure consistent file paths.

`data/`:
1. The knowledge base for the FORTRESS system is stored at `data/07_vector_db`, with prebuilt vector databases for the default and expanded knowledge base (for 4 models). As well as experiment-specific vector databases. 
2. The dataset csvs are all stored at `data/05_stitched`, all following the same format with columns of `original_prompt`, `label`, `split`, `source_file`, `[prompt_category]`, `[prompt_style]`.

`fortress/`:
1. Contains the core implementation of the FORTRESS system, including the vector store, embedding model, retriever, and the ensemble system.
2. It's not recommended to modify the code in this directory unless you are familiar with the system's architecture.

`scripts/`:
1. Contains various scripts for data processing, analysis, and visualization.
2. The specific guides are provided in the later sections of this README.




## Running the Project


### Interactive Scripts
There are 3 main interactive scripts in the `scripts/` directory that you can run to interact with the FORTRESS system:

#### Data Ingestion: `scripts/cli_data_ingestion.py`

This script provides an interactive and queue-based system for ingesting datasets into the FORTRESS vector database. Key features include:

- **Interactive Menu:** Add new ingestion tasks, view and manage the ingestion queue, and process pending tasks through a user-friendly CLI (with optional Rich UI).
- **Queue Management:** Supports batching and queueing of multiple ingestion jobs, with persistent tracking of task status (pending, processing, completed, failed).
- **Parallel Processing:** Utilizes multiprocessing and threading to efficiently extract NLP features, generate embeddings, and insert records into the vector database.
- **Automatic Settings Management:** Automatically updates `settings.yaml` to match the selected embedding model and database path for each task, with backup and restore for safety.
- **Database Documentation:** Generates a `README.md` in each new database folder, documenting the embedding model and source CSVs used for reproducibility.
- **Legacy Mode:** Supports direct ingestion of CSV files via command-line arguments for backward compatibility.

Users should first run this script to build all the database by running the script, and then choose option 3 to process all pending tasks in the queue. This will ensure that all datasets are properly ingested into the vector database before running any benchmarks or experiments.

#### Benchmarking: `scripts/cli_benchmark.py`
This script is the consolidated entry point for running and viewing benchmark results on FORTRESS and other baselines. It provides a unified, interactive CLI interface for:

- **Running Benchmarks:** Select one or more models and benchmark datasets, then launch batch runs with automatic configuration management. The tool supports both FORTRESS and external baseline models, handling settings swaps and output organization.
- **Viewing Results:** Explore consolidated tables of F1 scores, accuracy, and latency across all models and benchmarks. The script can also display per-language performance (for multilingual benchmarks), export full results to CSV, and highlight best-performing models.
- **Exporting Data:** Export misclassified prompts for error analysis or generate comprehensive CSV reports for further analysis.
- **Interactive Selection:** Uses rich terminal UIs for selecting models, benchmarks, and specific model-benchmark combinations, making it easy to customize runs and analyses.

#### Parameter Tuning: `scripts/cli_param_experiment.py`
This script provides an interactive CLI tool for running parameter sensitivity experiments on the FORTRESS system. Key features include:

- **Interactive Configuration:** Create or select experiment configurations, specifying which numeric parameters to tune (e.g., top-k, primary/mixed weights), their ranges, and increments.
- **Automated Grid Search:** Runs all combinations of selected parameter values across chosen benchmarks, saving results and restoring original settings after completion.
- **Result Organization:** Automatically organizes output files and experiment summaries for easy analysis.
- **Rich Visualization:** View experiment results in tables, identify best configurations, and generate plots (heatmaps, line plots) to visualize parameter impacts.
- **Summary Rebuilding:** Rebuild experiment summaries from raw result files if needed.



#### System Testing: `scripts/cli_fortress.py`
This script provides an interactive CLI for testing the FORTRESS detection pipeline on individual prompts. It allows users to enter prompts one at a time and view detailed detection results, including the final decision, confidence scores, justifications, and the top-k most similar documents retrieved from the vector database. The CLI supports custom collection selection, adjustable logging levels, and displays rich, color-coded output using the Rich library. This tool is ideal for debugging, manual evaluation, and exploring how FORTRESS processes and classifies specific inputs in real time.



### Perplexity Calibration: `run_perplexity_calibration.py`

This script optimizes the perplexity analyzer parameters for each prompt category in the FORTRESS system using Bayesian optimization. It searches for the best settings to minimize the mean squared error (MSE) between predicted adversarial probabilities and the target (safe/unsafe) labels.

**Key Features:**
- Loads prompts from a CSV file, filtering for the `'database'` split.
- Computes token log probabilities for each prompt using the current embedding model.
- Supports both global and per-category optimization modes.
- Uses Bayesian optimization (`gp_minimize` from `skopt`) to tune parameters:
    - `adversarial_token_uniform_log_prob`
    - `lambda_smoothness_penalty`
    - `mu_adversarial_token_prior`
    - `apply_first_token_neutral_bias`
- Saves optimized parameters and thresholds to a JSON file for use in the FORTRESS pipeline.
- Uses settings from `settings.yaml` if available, or falls back to hardcoded defaults.

**Usage Example:**
```bash
python run_perplexity_calibration.py \
        --input_csv data/05_stitched/your_dataset.csv \
        --output_json configs/perplexity_thresholds/optimized_params.json \
        --n_calls 30 \
        --n_initial_points 10 \
        --mode per_category
```
- `--mode` can be `per_category` (default) or `global`.

**Typical Workflow:**
1. Prepare your dataset CSV with labeled prompts and categories.
2. Run the script as shown above.
3. Use the generated JSON file to update your FORTRESS configuration for improved detection performance.

**Notes:**
- Requires a CUDA-enabled GPU for efficient embedding model inference.
- We prioritize setting `--mode` to `per_category` for more tailored results, `global` was only used for the ablation study in the paper.


### Running Experiments
#### Scalability Experiment: `scripts/run_scability_experiment.py`
This script evaluates the scalability of the FORTRESS system by measuring detection performance and latency as the size of the vector database is systematically reduced. It works by repeatedly removing random subsets of documents from the ChromaDB vector store and benchmarking the system on several datasets at each step.

**Usage Example:**
```bash
python scripts/run_scability_experiment.py --steps 16 --runs-per-step 3 --db-path data/07_vector_db/gemma3_1b_exp_scale_experiment
```

**Notes:**
- Requires a CUDA-enabled GPU and a ChromaDB vector store.
- The script is safe to run: it creates a backup and restores the database after completion.
- Useful for understanding how FORTRESS performance scales with knowledge base size and for generating figures for publication.


#### Noise Robustness Experiment: `scripts/run_noise_experiment.py`
This script evaluates the robustness of the FORTRESS system to label noise in the training data. It systematically injects varying levels of random label noise into the dataset, restores the vector database to a clean state before each run, and benchmarks system performance (F1, accuracy, FPR, FNR) at each noise level.

**Key Features:**
- **Automated Noise Injection:** Uses `inject_label_noise.py` to randomly flip a specified proportion of labels in the dataset for each run.
- **Multiple Runs per Noise Level:** Supports repeated runs with different random seeds to average out stochastic effects.
- **Benchmark Integration:** Runs the full FORTRESS benchmark pipeline after each noise injection, collecting detailed metrics.
- **Result Aggregation:** Aggregates results across runs and noise levels, saving both raw and summary CSVs.
- **Visualization:** Generates plots (PDF) showing how FORTRESS performance degrades as label noise increases.
- **Safe Database Handling:** Backs up and restores the vector database at each step to ensure reproducibility and prevent data corruption.

**Usage Example:**
```bash
python scripts/run_noise_experiment.py \
    --noise-levels 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 \
    --num-runs 5 \
    --output-dir benchmarks/noise_experiment_results
```

**Typical Workflow:**
1. Ensure the clean vector database exists at the configured path.
2. Run the script with desired noise levels and number of runs.
3. Inspect the generated CSVs and plots in the output directory to analyze robustness.

#### Leave-One-Category-Out Experiment: `scripts/run_loco_experiment.py`
This script implements a **Leave-One-Category-Out (LOCO) cross-validation experiment** for FORTRESS. It systematically evaluates how well the system generalizes to each unsafe prompt category when that category is excluded from the vector database during training.

**How it works:**
- The dataset is split into `k` folds (default: 5).
- For each fold, and for each unsafe category:
    - All prompts of the held-out category are removed from the training pool and the vector database.
    - The system is benchmarked on the held-out category in the test fold.
    - Results (F1, recall, precision, sample count) are recorded for each fold/category combination.
- After all runs, results are aggregated and summarized per category and overall.

**Usage Example:**
```bash
python scripts/run_loco_experiment.py --k-folds 5 --output-dir benchmarks/loco_experiment_results
```

