## Summary 
This subfolder contains the code for main evaluation pipeline of the library. It runs the codecs, launches adversarial attacks on them, and evaluates their robustness and quality both in clear and adversarial scenarios.

---

## File overview

### Module dependency structure

```
main_eval_script.py (Entry Point)
├── config_parser.py
├── setup_modules.py
│   ├── defended_model.py
│   └── helpers.py (load_defence_params_json)
├── dataloaders.py
├── quality_evaluators.py
│   ├── metrics.py
│   ├── traditional_reference_codec.py
│   └── helpers.py (to_numpy, to_torch)
├── codec_scoring_methods.py
├── codec_losses.py
├── raw_data_scheme.py
│   └── quality_evaluators.py (FR_COLS, NR_COLS)
└── helpers.py
    └── color_transforms_255.py

External Dependencies:
├── run_config.yaml (primary config, available in /scripts/)
├── attack_presets_codecs.json (config, attack parameter presets)
├── defence_presets.json (config, defence parameter presets)
└── Dockerfile (deployment)
```


| File                               | Description                                                                                           |
| ---------------------------------- | ----------------------------------------------------------------------------------------------------- |
| `main_eval_script.py`              | Master script orchestrating the complete codec robustness evaluation pipeline.                       |
| `attack_presets_codecs.json`       | Configuration file defining attack parameters and presets for different codec evaluation scenarios.   |
| `defence_presets.json`             | Configuration file defining defensive preprocessing parameters and presets.                           |
| `Dockerfile`                       | Container configuration for reproducible evaluation environment setup.                               |
| `config_parser.py`                 | Configuration parsing utilities for loading experiment settings from YAML files.                     |
| `setup_modules.py`                 | Module initialization and dependency setup utilities for the evaluation framework.                   |
| `defended_model.py`                | Wrapper classes for applying defensive preprocessing to codec models.                           |
| `dataloaders.py`                   | Dataset loading utilities for images with PyTorch-compatible data structures.             |
| `raw_data_scheme.py`               | CSV data structure definitions for organizing evaluation results.                             |
| `codec_losses.py`                  | Collection of different differentiable adversarial attack objectives.              |
| `quality_evaluators.py`            | High-level classes for batch computation of image quality metrics.                             |
| `codec_scoring_methods.py`         | Methods for final codec robustness/quality score computation based on raw results.                   |
| `defence_scoring_methods.py`       | Metrics and scoring functions for evaluating defence effectiveness against adversarial attacks.      |
| `metrics.py`                       | Standardized quality metric implementations (PSNR, SSIM, VMAF, etc.) with unified interfaces.       |
| `helpers.py`                       | General utility functions and helpers used across the evaluation pipeline.                           |
| `color_transforms_255.py`          | PyTorch utilities for fast, differentiable color space conversion in the 0-255 range.               |
| `traditional_reference_codec.py`   | JPEG2000 baseline codec implementations for comparison against learned codecs.                       |

---

## Submodules overview

### `main_eval_script.py`

Master orchestration script that coordinates the complete codec robustness evaluation pipeline, integrating attack generation, codec evaluation, and comprehensive quality assessment.

* **Core evaluation functions**:
  * `evaluate_codec()` – performs comprehensive codec assessment on both clear and attacked images:
    * Runs defended and undefended codec variants in parallel
    * Handles JPEGAI dual-codec architecture (main + auxiliary models)
    * Collects timing, BPP, and quality metrics for all scenarios
    * Manages image saving and reference codec comparisons
  * `run_robustness_evaluation()` – coordinates full dataset evaluation:
    * Applies adversarial attacks with reproducible seeding
    * Iterates through image datasets with progress tracking
    * Aggregates results into structured DataFrames with proper metadata
* **Pipeline orchestration** – `test_main()` manages the complete experimental workflow:
  * Configuration loading and module setup (codecs, defences, attack presets)
  * Multi-preset evaluation loops (attack parameter sweeps)
  * Multi-dataset processing with independent result logging
  * Hierarchical CSV output (per-dataset + aggregate results)
  * Parallel evaluation for JPEGAI main codec when applicable
* **Data management**:
  * Structured directory creation for attacked/reconstructed image datasets
  * Frequency-controlled image dumping for qualitative inspection
  * Comprehensive timing statistics (attack time, codec inference time)
  * Metadata population ensuring traceability of all experimental conditions
* **Error handling** – robust attack failure detection with descriptive error messages to ensure evaluation continuity

> **Integration point**: This script serves as the primary entry point for the evaluation framework, consuming configuration from `config_parser`, utilizing all utility modules (`helpers`, `quality_evaluators`, `codec_scoring_methods`), and producing standardized outputs ready for analysis and reporting.


#### Notable implementation details / comments

* **JPEG-AI branch** – the script handles 0–255 *YCbCr* tensors and converts them back to 0–1 RGB for metric calculation (`is_jpegai` flag).
* **Config-driven** – codec hyper-parameters come from `src/config.json`; attack/defence presets from JSON or CSV make sweeping experiments reproducible.
* **Timing stats** – average inference time (`mean_time`) and attack time are appended to the score tables for quick throughput checks.

---

### `config_parser.py`

Configuration management utilities for loading and validating experimental settings from YAML files with command-line override support.

* **Core function** – `get_run_config()` parses YAML configuration files and merges them with CLI arguments (codec, attack, attack_preset, loss_name).
* **Validation** – `validate_config()` ensures all required parameters are present and correctly typed:
  * Dataset paths and names consistency
  * Required string fields (codec, attack, loss_name) 
  * Numeric constraints (positive frequencies, valid device specification)
  * Optional path validation for save directories
* **Default handling** – automatically sets sensible defaults for optional parameters:
  * Save paths → `None` (disabled)
  * Device → `'cuda:0'`
  * Frequencies → `1`
  * Boolean flags → `False`

> **Tip**: CLI arguments always override YAML values, making it easy to sweep parameters without editing config files.
---

### `setup_modules.py`

Module initialization and dependency setup utilities that dynamically load codec models and defensive preprocessing from configuration.

* **Codec setup** – `setup_codec()` uses `importlib` to dynamically load codec modules, handles JPEGAI-specific dual-model setup (main + auxiliary), and extracts input/output range specifications.
* **Defence integration** – `setup_defence()` loads defensive preprocessing modules, applies preset parameters from JSON configs, and wraps codecs with `defended_model.CodecModel`.
* **Environment prep** – `setup_files()` creates necessary directories and sets environment variables for loss function specifications.
* **Preset management** – `setup_attack_presets()` determines which attack parameter presets to run based on configuration flags:
  * Single preset → `[attack_preset]`
  * Default only → `[-1]` (no preset file)
  * All presets → `[0, 1, 2]`

> **Note**: Automatically handles JPEGAI's dual-codec architecture and runs defence setup scripts if present.

---

### `defended_model.py`

Wrapper classes that integrate defensive preprocessing modules with codec and metric models, enabling seamless application of defences during evaluation.

* **Codec wrapper** – `CodecModel` wraps neural compression models with defensive preprocessing:
  * Applies defence preprocessing before codec encoding
  * Applies defence postprocessing after codec decoding
  * Preserves codec attributes (input_range, output_range, output_cspace)
  * Supports bidirectional defence integration (sets codec reference in defence if supported)
* **Transparent integration** – both wrappers maintain the same interface as the original models while adding defence capabilities:
  * Forward pass routing through defence → model → postprocessing (codec only)
  * Automatic attribute forwarding for compatibility with existing evaluation code

> **Design pattern**: These wrappers implement the decorator pattern, allowing any defence to be applied to any codec or metric without modifying the underlying model implementations. Used by `setup_modules.py` to create defended model variants based on configuration.

---

### `dataloaders.py`

Dataset loading utilities for images with PyTorch-compatible data structures and flexible preprocessing options.

* **Path collection** – `collect_image_paths()` recursively searches directories up to a configurable depth, filtering for common image formats (JPG, PNG, BMP, TIFF, etc.) with case-insensitive extension matching.
* **Dataset class** – `ImageFolderDataset` provides PyTorch `Dataset` interface with automatic image loading:
  * Converts BGR→RGB and normalizes to [0,1] range
  * Applies dimension processing (crop, resize, or pad) to ensure compatibility with codec block sizes
  * Returns dictionaries with image tensor, full path, and filename for downstream processing
* **Batch handling** – `image_folder_collate_fn()` custom collate function organizes batches into lists rather than stacked tensors, accommodating variable image dimensions within the same batch.
* **Preprocessing modes**:
  * `'crop'` – truncates dimensions to multiples of `proc_mult`
  * `'resize'` – bilinear interpolation to nearest valid dimensions  
  * `'pad'` – zero-padding to next multiple boundary

> **Design note**: Currently optimized for batch_size=1 due to variable image resolutions, but provides foundation for future multi-image batching.

---

### `raw_data_scheme.py`

Data structure definitions and column schemas for organizing comprehensive codec evaluation results into standardized CSV format.

* **Column specification** – `RAW_RESULTS_COLS` defines the complete schema for raw evaluation data, encompassing:
  * Metadata fields (image_name, codec_name, defence_name, test_dataset, loss_name)
  * Compression statistics (BPP for defended/undefended, clear/attacked scenarios)
  * Quality metrics for all image pair combinations (clear↔reconstructed, attacked↔reconstructed)
  * Reference codec comparisons (JPEG2000 at target quality and fixed BPP)
* **Metric organization** – systematically constructs column names using:
  * Full-reference metrics (`FR_COLS`) for all reconstruction quality assessments
  * No-reference metrics (`NR_COLS`) for perceptual quality evaluation
  * Cross-comparison metrics between different processing pipelines
* **Naming convention** – uses descriptive suffixes to distinguish evaluation scenarios:
  * `defended-rec-clear` vs `rec-clear` (with/without defence)
  * `jpeg` vs `jpeg-fix` (target quality vs target rate matching)

> **Purpose**: Ensures consistent data structure across all evaluation runs, enabling reliable aggregation and comparison of results from different codecs, attacks, and defences.

---

### `codec_losses.py`

A mini-library of differentiable loss terms for image-compression attacks.

* **Color-space utility** – `process_colorspace()` converts tensors between RGB and *YCbCr*, automatically handling the special JPEG-AI case where data live in `[0 … 255]`.
* **Loss collection**

  * Noise-matching: `added_noises_loss()` / `added_noises_loss_Y()`
  * Reconstruction MSE: `reconstr_loss()` / `reconstr_loss_Y()` and `src_reconstr_loss_Y()`
  * FTDA baselines: `ftda_default_loss()` / `ftda_default_loss_Y()`
  * Perceptual variants: `ftda_msssim_loss()` and `reconstruction_msssim_loss()` (multiscale SSIM)
  * Rate term: `bpp_increase_loss()`
  * **Experimental** focus loss: `pointwise_added_noises_loss()` applies a Gaussian-blur mask around a chosen pixel.
* **Registry** – `loss_name_2_func` lets training scripts pick a loss by string key.

> *Tip*: all MSE-style functions return **negative values** so that *maximising* the objective increases quality—mention this quirk elsewhere in the docs to avoid confusion.

---

### `quality_evaluators.py`

High-level orchestration for comprehensive image quality assessment across multiple codec evaluation scenarios.

* **Metric computation**
  * `evaluate_fr_metrics()` – batch full-reference metrics (PSNR, SSIM, MS-SSIM, VMAF, MSE, MAE, L∞)
  * `evaluate_nr_metrics()` – no-reference quality assessment (NIQE) using pre-loaded torch models
* **Codec evaluation pipeline** – `evaluate_codec_image_quality()` performs exhaustive quality analysis:
  * Clear vs reconstructed (defended/undefended)
  * Attacked vs reconstructed (defended/undefended) 
  * Cross-comparisons (clear-vs-attacked, reconstructed-clear-vs-reconstructed-attacked)
  * Both FR and NR metrics for all image pairs
* **Reference codec benchmarking** – `evaluate_reference_codec()` compares against JPEG2000 baselines:
  * Target-quality matching (equal PSNR)
  * Target-rate matching (equal BPP)
  * Full metric suite on both reference and learned codec outputs

> **Output**: Returns mean metric scores in a flat dictionary structure ready for DataFrame insertion and CSV export.

---

### `codec_scoring_methods.py`

Helpers for aggregating quality-metric results stored in a `pandas.DataFrame`.

* **Delta scores** – quantify attack impact (with defence applied)

  ```text
  FR:  fr_delta_score()      # clear-vs-attacked reconstruction
  NR:  nr_delta_score()
  ```
* **Defence effectiveness** – extra deltas comparing *with* vs *without* defensive preprocessing

  ```text
  FR:  fr_defence_delta_score()
  NR:  nr_defence_delta_score()
  ```
* **Baseline checks**

  * `mean_fr_clear_attacked()` – mean FR metric between *clear* and *attacked* images.
  * `delta_nr_clear_attacked()` – NR difference on originals (no reconstruction).
* **Convenience wrapper** – `calc_scores_codec()` loops over all registered metrics, prints each statistic, and returns a table (`DataFrame`) ready for logging or LaTeX export.

  * Built-in dictionaries `fr_2_lower_better` and `nr_2_lower_better` tell each scorer whether “lower-is-better,” automatically flipping signs where needed.

---

### `defence_scoring_methods.py`

A collection of helpers that **quantify how well a defence recovers image-quality metrics after an attack**.
All routines operate on a `pandas.DataFrame` whose columns follow the naming pattern used in earlier scripts (`clear`, `attacked`, `defended-clear`, `defended-attacked`, plus SSIM/PSNR columns).

| Group                                | Function(s)                                           | What it measures                                                                         |
| ------------------------------------ | ----------------------------------------------------- | ---------------------------------------------------------------------------------------- |
| **Relative / absolute quality loss** | `robust_rel_gain`, `robust_abs_gain`                  | Δ between pristine images and **defended-attacked** outputs.                             |
|                                      | `both_defended_rel_gain`, `both_defended_abs_gain`    | Δ when *both* clear and attacked images were run through the defence.                    |
|                                      | `nonpurified_rel_gain`, `nonpurified_abs_gain`        | Baseline Δ for **unprotected** attacks (no defence).                                     |
| **Per-pair similarity**              | `defence_similarity_score`                            | Combined SSIM + PSNR of defended-attacked vs. clear images.                              |
|                                      | `defence_clear_similarity_score`                      | Same but on defended-clear vs. clear (should be *high* if defence is “non-destructive”). |
| **Rank-correlation checks (SROCC)**  | `robust_attacked_srocc_mos`, `robust_clear_srocc_mos` | Correlate metric values *after defence* with human MOS collected on the clear originals. |
|                                      | `clear_srocc_mos`, `attacked_srocc_mos`               | Correlate *raw* metric scores (no defence) with MOS.                                     |
|                                      | `robust_clear_srocc_clear`                            | Correlate metric before/after defence on clear images (checks monotonicity).             |


* **`calc_scores_defence(df, metric_range=1)`** iterates over this registry, prints each statistic for quick CLI inspection, and returns a tidy `DataFrame` (`score`, `value`) ready for logging, CSV export, or LaTeX tables.

---

### `metrics.py`

Tiny wrapper layer that standardises all **quality-metric calls** used in the pipeline.

| Function / Class           | Role                                                                                  |
| -------------------------- | ------------------------------------------------------------------------------------- |
| `PSNR`               | Peak-signal-to-noise ratio via **skimage**.                                           |
| `SSIM`               | Mean structural similarity (multi-frame friendly).                                    |
| `MSE`, `MAE`, `L_inf_dist` | Classic pixel-wise errors.                                                            | 
| `MSSSIM`             | Multiscale SSIM using **pytorch-msssim** (GPU-friendly).                              |
| `vmaf`          | **VMAF** via FFmpeg/`libvmaf`. Saves temp PNGs, runs subprocess, parses the JSON log. |
| `niqe`             | Thin `torch.nn.Module` that wraps **piq**’s NIQE (`lower_better=True`).               |

> **Tip**: all helpers return **Python scalars** (except `vmaf`, which returns a 0-D tensor) so they can drop straight into NumPy/Pandas without type juggling.

---

### `helpers.py`

General utility functions providing tensor conversion, image I/O, attack orchestration, and configuration loading across the evaluation pipeline.

* **Tensor utilities**
  * `to_torch()` / `to_numpy()` – seamless conversion between NumPy (NHWC) and PyTorch (NCHW) formats
  * `center_crop()` – standardized 256×256 cropping with automatic upsampling for smaller images
* **Color space conversion** – JPEGAI-specific helpers for YCbCr→RGB transformation with proper range handling (0-255 vs 0-1)
* **Attack coordination** – `apply_attack()` manages model state transitions (train→eval), seeds reproducibility, times execution, and handles attack failure cases
* **Codec application** – `apply_codec()` wraps codec inference with timing, BPP extraction, and format standardization, supporting both standard and JPEGAI dual-codec modes
* **Configuration loading**
  * `load_attack_params_json()` / `load_defence_params_json()` – preset parameter loading with fallback to defaults
  * `fill_df_metadata()` – standardized metadata column population for result DataFrames
* **Image I/O** – `save_image()` handles tensor→PNG conversion with proper RGB format and range clamping

> **Key feature**: All functions handle the JPEGAI special case (YCbCr, 0-255 range) transparently while maintaining compatibility with standard RGB codecs.

---

### `color_transforms_255.py`

Utility wrappers around **PyTorch** tensors for fast, differentiable colour-space conversion in the *0 … 255* range.

| Function             | Purpose                                                                                                  |
| -------------------- | -------------------------------------------------------------------------------------------------------- |
| `_rgb_to_y()`        | Core helper that computes the luminance channel **Y** from R G B tensors.                                |
| `rgb_to_ycbcr_255()` | Convert an RGB image to full-range **YCbCr**. Returns a 3-channel tensor in the same shape as the input. |
| `rgb_to_y_255()`     | Extract *only* the Y (luma) channel from RGB. Handy for Y-only metrics.                                  |
| `ycbcr_to_rgb_255()` | Inverse transform back to RGB; clamps output to `[0, 255]`.                                              |

---


### `traditional_reference_codec.py`

Quick wrappers around **Glymur** to benchmark *JPEG 2000* against your learned codecs.

| Function                                                  | Job                                                                                                                                 |
| --------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| `jpeg2k_compress(src, dump_path, target_quality, device)` | Encodes each RGB frame at a **given PSNR** (one target per image). Returns the decoded tensor ∈ \[0 … 1] and a list of actual BPPs. |
| `jpeg2k_compress_fix_bpp(src, dump_path, bpp, device)`    | Encodes at a **fixed bit-per-pixel**. Uses `cratios = 24 / bpp`. Same outputs as above.                                             |

---
