# Evaluate T2I models along diversity, quality, consistency as functions of prompt complexity
## Environment dependencies
Base environment configuration are in file `envirionment.yml`.
For faiss search, use environment in file `environment_faiss.yml`.
For evaluation, we adopt several packages:
+ For aesthetic_evaluation, check https://github.com/discus0434/aesthetic-predictor-v2-5
+ For diversity_evaluation, check https://github.com/vertaix/Vendi-Score
+ For marginal_evaluation, check https://github.com/layer6ai-labs/dgm-eval/
+ For consistency_evaluation, check https://github.com/facebookresearch/EvalGIM/
## Framework
Use the scripts in folder `pipeline` to create datasets of different complexities and search clusters.
1. Run `vlm_captioning_multi_compleixity.py` to caption the dataset to different lengths.
2. Run `gather_captioning_multi_complexity.py` to create a metadata file for image-caption pairs of different lengths.
3. Run `get_siglip_embeddings_img.py` and `get_siglip_embeddings_text.py` to get siglip embeddings.
4. Run `faiss_search.py` to find the most similar images for each caption.
5. Run `cluster_formation.py` to get the clusters that verify the similarity threshold, minimum cluster size, and sample the needed captions for generaion.
## Evaluation
Use the scripts in folder `evaluation` to compute different metrics.
1. Aesthetic score: Install [aesthetic score](https://github.com/discus0434/aesthetic-predictor-v2-5). Run `sbatch aesthetic_evaluator.sh` under `aesthetic_evaluation`.
2. DSG score: Clone [EvalGIM](https://github.com/facebookresearch/EvalGIM/). Configure the dataset following https://github.com/facebookresearch/EvalGIM/?tab=readme-ov-file#add-your-own-datasets. We prepare the dataset class for CC12M with different complexities in `cc12m_dataset_evalgim.py`. Clone [DSG](https://github.com/j-min/DSG) and use `gen_dsg.py` to get question graphs. Run `sbatch run_sbatch_evaluation_conditional.sh` under `consistency_evaluation`.
3. Vendi score: Clone [vendi score](https://github.com/vertaix/Vendi-Score). Run `python vendi_calculation.py` under `diversity_evaluation`.
4. marginal metrics: Clone [dgm-eval](https://github.com/layer6ai-labs/dgm-eval/). Run `sbatch run_evaluation_cc12m.sh` under `marginal_evaluation`.
## Dataset
1. Download the cc12m images following https://github.com/google-research-datasets/conceptual-12m. Put the images under `metadata/cc12m/images/`.
2. Run the scripts following Framework section, and get the clusters for each complexity. Then create the `eval_imgs.zip` containing all the remaining real images under `metadata/cc12m/`. In our scripts, we sharded the eval_imgs to 9 zip files.