# Scripts for Metric Evaluation

This directory contains scripts used to evaluate various metrics, particularly for image and text generation models, as referenced in the paper "TIPO: Text to Image with Text Pre-sampling for Prompt Optimization."

Please ensure you have any necessary dependencies installed before running these scripts. Refer to individual script comments or project documentation for more details on setup.

## Script Descriptions

* **`inference.py`**
    * **Purpose**: Supplementary script to utilize the SD3.5m model with parallel workers. It is designed to be called by `main.py`.
    * **Note**: This script is not intended to be run directly.

* **`main.py`**
    * **Purpose**: Main script for running model inference.
    * **Input**: A `.jsonl` file (e.g., `<dataset_name>.jsonl`) containing prompts or other necessary input data.
    * **Note**: This script may coordinate `inference.py` or `worker.py` depending on the selected model or configuration. Please check its arguments or internal configuration for model selection.

* **`worker.py`**
    * **Purpose**: Worker script specifically designed for use with the SD3.5l model.
    * **Note**: Typically used by `main.py` or a similar orchestrator when the SD3.5l model is selected for inference.

* **`visualize_worst.py`**
    * **Purpose**: Generates a grid visualization of the best and worst scoring aesthetic images from a dataset.
    * **Input**: Relies on `aesthetic.json` (generated by `metric.py` or a similar script).
    * **Output**: An image file displaying the grid of selected images.
    * **Prerequisite**: `aesthetic.json` must be generated and available in the expected location.

* **`metric.py`**
    * **Purpose**: Calculates aesthetic scores for a collection of images using the `aesthetic_predictor_v2_5` model.
    * **Input**: A directory containing the images to be scored.
    * **Output**: A `.json` file (e.g., `aesthetic.json`) mapping image filenames to their aesthetic scores.

* **`corrupt.py`**
    * **Purpose**: Evaluates images for various types of corruption and calculates corruption scores using `AICorruptMetrics`.
    * **Input**: A directory containing the images to be scored.
    * **Output**: A `.json` file (e.g., `corrupt.json`) mapping image filenames to their corruption scores.

* **`run_vendi_single.py`**
    * **Purpose**: Computes the Vendi Score for a given set of images, which measures dataset diversity.
    * **Input**: A directory containing the images.
    * **Output**: A `.json` file containing the calculated Vendi scores.

* **`metrics/stats_visualize.py`**
    * **Purpose**: Creates density plots to visualize the distribution of aesthetic and corruption scores.
    * **Input**: `aesthetic.json` and `corrupt.json` files.
    * **Output**: Image file(s) of the generated density plots.
    * **Note**: This script is expected to be located in a subfolder named `metrics` within this main metrics directory (i.e., `metrics/metrics/stats_visualize.py` from the project root). If it's in the current directory, remove the `metrics/` prefix.

* **`metrics/verbose.py`**
    * **Purpose**: Generates and prints a verbose summary of evaluation results from multiple models.
    * **Input**: A main directory that contains subfolders for each model. Each model's subfolder is expected to contain its respective `aesthetic.json` and `corrupt.json` files.
    * **Output**: A detailed summary of results printed to the console (or specify if saved to a file, e.g., a CSV or text report).
    * **Note**: This script is expected to be located in a subfolder named `metrics` within this main metrics directory. If it's in the current directory, remove the `metrics/` prefix.

* **`metrics/correl.py`**
    * **Purpose**: Supplementary script for analyzing the correlation between image characteristics (e.g., color profiles, detected objects, style attributes) and model evaluation scores. This can help understand model sensitivities or "fragilities."
    * **Input**: Image data (e.g., directory of images) and corresponding score files (e.g., `aesthetic.json`, `corrupt.json`). Specific inputs might vary based on the analyses performed.
    * **Output**: Correlation statistics, potentially plots or reports, which may be printed to the console or saved to files.
    * **Note**: This script is expected to be located in a subfolder named `metrics` within this main metrics directory. If it's in the current directory, remove the `metrics/` prefix.

* **`clip_inference.py`**
    * **Purpose**: Calculates image-text similarity scores using a CLIP model.
    * **Input**:
        * Path to an image file.
        * A text string (prompt/caption) or path to a text file containing the text.
    * **Output**: The CLIP similarity score, typically printed to the console or saved to a `.json` file.

* **`clip_t2t_inference.py`**
    * **Purpose**: Computes text-to-text similarity between pairs of texts provided in a `.jsonl` file. This is often used to compare input prompts with generated textual descriptions or captions.
    * **Input**: A `.jsonl` file where each line is a JSON object containing fields for the first text and the second text (e.g., `{"input_text": "...", "output_text": "..."}`).
    * **Output**: Similarity scores. These might be appended to the original `.jsonl` entries, saved to a new `.jsonl` file, or printed to the console.

* **`llama-prompt-gen.py`**
    * **Purpose**: A supplementary script for generating creative prompts using the `llama3.2-1b` language model.
    * **Note**: While this script is functional, the prompts generated by the `llama3.2-1b` model were often censored or of inconsistent quality, and thus were not utilized in the final paper experiments. This script is provided for completeness or for further experimentation.

* **`statistics-calculate.py`**
    * **Purpose**: A helper script to compute and display summary statistics (e.g., mean, median, standard deviation, min/max) from generated score files like `aesthetic.json` and `corrupt.json`.
    * **Input**: One or more `.json` score files.
    * **Output**: Summary statistics printed to the console or potentially saved to a text or CSV file.

* **`symlink-worst.py`**
    * **Purpose**: Creates symbolic links to the highest and lowest scoring images (e.g., based on aesthetic scores) in a specified output folder. This facilitates quick manual review and visualization of extreme examples.
    * **Input**: A score file (e.g., `aesthetic.json`) and the path to the directory containing the original images.
    * **Output**: A new folder populated with symbolic links to the selected images.
