# Data Provenance in IARS

Extended functionality for the [Microsoft VQ-Diffusion model](https://github.com/microsoft/VQ-Diffusion) with additional scripts for image generation, VQVAE fine-tuning, loss analysis, and timing benchmarks.

## Installation

### Environment Setup

For the original VQ-Diffusion setup, installation requirements, data preparation, and pretrained models, please refer to the [original Microsoft VQ-Diffusion README](https://github.com/microsoft/VQ-Diffusion/blob/main/readme.md).

Install requirements:
```bash
bash install_req.sh
```

Download pretrained models:
```bash
bash vqdiffusion_download_checkpoints.sh
```

### Model Checkpoints

We assume that all pretrained VQ-Diffusion checkpoints will be placed under `OUTPUT/pretrained_model/` following the original repository structure:
- `ithq_learnable.pth` - ITHQ model with learnable classifier-free
- `coco_learnable.pth` - MS-COCO model
- `imagenet_learnable.pth` - ImageNet model
- Configuration files in `configs/`

## Usage

### Large-scale Image Generation

Generate images from MS-COCO captions or ImageNet classes using `generate_images.py`. 

For text-to-image generation (50,000 COCO images):
```bash
python generate_images.py
```

Configuration is done by editing the `Config` class in the script:
```python
class Config:
    MODEL_NAME = 'ithq'  # Model selection
    MSCOCO_DATA_PATH = "/data/datasets/mscoco2014val/annotations/captions_val2014.json"
    NUM_PROMPTS = 50000  # Number of images to generate
    BATCH_SIZE = 25      # Processing batch size
    TRUNCATION_RATE = 1.0
    GUIDANCE_SCALE = 5.0
    EXPORT_TOKENS = True  # Required for fine-tuning
```

For ImageNet class generation:
```python
class Config:
    MODEL_NAME = 'imagenet'
    USE_RANDOM_CLASSES = True
    NUM_PROMPTS = 1000  # Number of random classes
    REPLICATE = 1       # Images per class
```

### VQVAE Fine-tuning

Fine-tune VQVAE components on generated images with ground truth tokens using `finetune_vqvae_with_dataset.py`:

```bash
python finetune_vqvae_with_dataset.py
```

Configure training by editing the `VQVAEFinetuningConfig` class:
```python
class VQVAEFinetuningConfig:
    DATASET_ROOT = "/path/to/generated_images/ithq_np50000_pr2_seed5"
    TRAIN_ENCODER = True     # Fine-tune encoder
    TRAIN_DECODER = False    # Keep decoder frozen
    NUM_EPOCHS = 50
    BATCH_SIZE = 16
    LEARNING_RATE = 5e-5
    MSE_FEAT_WEIGHT = 1.0   # Feature matching loss
```

The script requires:
- Generated images from `generate_images.py`
- Ground truth tokens (automatically saved as `all_tokens.pt`)
- WandB account for experiment tracking

### Loss Analysis and Detection

Perform comprehensive loss analysis across multiple datasets using `loss_analysis.py`:

```bash
python loss_analysis.py
```

Configure datasets and analysis parameters:
```python
class Config:
    DATASETS = {
        "ImageNet (val)": "/datasets/imagenet256_eval",
        "LAION": "/data/datasets/laion_1k_clean",
        "MS-COCO": "/data/datasets/mscoco2014val/val2014_subset",
    }
    GENERATED_IMAGES_FOLDER = "/path/to/vq_diffusion/generated"
    USE_LATENT_TRACER = True
    FINETUNED_VQVAE_WEIGHTS = None  # Path to fine-tuned weights (None uses pretrained weights)
```

Outputs:
- AUC and TPR@1%FPR detection metrics
- ROC curves and performance heatmaps
- Distribution visualizations
- CSV summary files

### Timing Benchmarks

Benchmark inference performance using `timing_benchmark.py`:

```bash
python timing_benchmark.py
```

Configuration options:
```python
class TimingConfig:
    NUM_SAMPLES = 100
    NUM_REPEATS = 5
    TEST_DATASET_PATH = "/datasets/imagenet256_eval"
    ENABLE_LATENT_TRACER = True
    LATENT_TRACER_ITERS = 100
```

## Key Features

### Configuration
All scripts use class-based configuration. Edit the `Config` class attributes directly in source files rather than using command-line arguments.

## Acknowledgements

This code extends the Official Implementation of [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://github.com/microsoft/VQ-Diffusion).

If you use the original VQ-Diffusion model, please cite:
```bibtex
@article{gu2021vector,
  title={Vector Quantized Diffusion Model for Text-to-Image Synthesis},
  author={Gu, Shuyang and Chen, Dong and Bao, Jianmin and Wen, Fang and Zhang, Bo and Chen, Dongdong and Yuan, Lu and Guo, Baining},
  journal={arXiv preprint arXiv:2111.14822},
  year={2021}
}

@misc{tang2023improvedvectorquantizeddiffusion,
  title={Improved Vector Quantized Diffusion Models}, 
  author={Zhicong Tang and Shuyang Gu and Jianmin Bao and Dong Chen and Fang Wen},
  year={2023},
  eprint={2205.16007},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2205.16007}, 
}
```
