# The Hidden Language of Diffusion Models- Official Implementation
This code allows to reproduce the results of our paper, *The Hidden Language of Diffusion Models*. 

### Environment
Our code builds on the requirements of the [Hugging Face Diffusers repository](https://github.com/huggingface/diffusers). To set up the environment, please run:

```
conda env create -f environment.yaml
conda activate conceptor
```

### Step 1- extracting the CLIP text embeddings for the vocabulary
To avoid computing the CLIP text embeddings for the entire vocabulary each time we optimize a decomposition, we provide code to extract the CLIP embedding for the entire vocabulary once, and save the embeddings such that we can load them for each concept.

```
python save_dictionary_embeddings.py --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1-base" --path_to_encoder_embeddings="./clip_text_encoding.pt"
```
This code calculates the CLIP text embeddings for the vocabulary of SD (loaded from pretrained weights according to ```pretrained_model_name_or_path```), and saves the embeddings to path ```path_to_encoder_embeddings```.

### Step 2- decompose the concept
We provide an end-to-end script to decompose the concept and perform validation. 
```
python one_step_reconstruction.py --prompt="a photo of a dog" --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1-base" --placeholder_token="<>" --train_batch_size=6 --validation_prompt="a photo of a <>" --num_validation_images=20 --train_data_dir="./dog_train"/ --validation_data_dir="./dog_val" --output_dir="./dog" --dictionary_size=5000 --num_explanation_tokens=50 --validation_steps=50 --learning_rate=1e-3 --max_train_steps 500 --seed 1024 --validation_seed 2 --sparsity_coeff=0.001 --path_to_encoder_embeddings="./clip_text_encoding.pt" > ./log.txt
```
This code will generate training and validation images for the concept prompt ```prompt``` and save them to ```train_data_dir``` and ```validation_data_dir```, respectively. The vocabulary embedding will be extracted from step 1 (```path_to_encoder_embeddings='./clip_text_encoding.pt'```). The log for the run will be saved to ```log.txt```, and the validation images and the coefficients for the entire vocabulary will be saved to ```output_dir```.
The best validation coefficients will be saved in ```best_alphas.pt```.


### Step 3- extract the top 50 tokens, and generate images
In the notebook ```visualize_concept.ipynb``` we provide code to visualize the top tokens from the learned decomposition.
Additionally, we provide **none cherry-picked** results showing our reconstruction abilities. We generate the first 6 images with seed 0 with our token and the original concept (*"dog"*) to demonstrate the effectiveness of our method in reconstructing the concept.

<p align="center">
<img src="notebooks/notebook.png" width="700px"/> 
</p>
 
