# Language Models Can See: Plugging Visual Controls in Text Generation

## In this file, we show how to reproduce the results presented in the paper.
### 1. Demo:
We provide a demo that illustrates the inner-workings of our approach. You can open the file demo.gif with web browsers (e.g., Safari and Chrome, etc.) to see the illustration.

### 2. Environment Setup (python version: 3.8):
    pip install -r requirements.txt

### 3. Reproduce the Results of Zero-Shot Image Captioning:
#### 3.1. Download the language model:
    ```yaml
    chmod +x ./download_model.sh
    ./download_model.sh
    ```
    
#### 3.2. Reproduce the results:
To reproduce the results in our case study (i.e., Figure 1) as well as in the appendix (i.e., Figure 4 in Appendix C), please run the following command.

    ```yaml
    python image_caption_demo.py
    ```

#### 3.3 Inference results on MS-COCO and Flickr30k:
In the supplementary file, we also provide our inferenced results on MS-COCO (./image_captioning/magic_mscoco_result.json) and Flickr30k (./image_captioning/magic_flickr30k_result.json).

The inferenced file is a list of dictionary, where the data format of each dictionary is:
    ```yaml
    {  
       "split": Indicating which split (train, val, or test) the data instance belongs to.
       "image_name": The name of the corresponding image.
       "captions": A list of captions that the data instance contains.
       "prediction": The predicted result of the model.
    }
    ```

### 4. Reproduce the Results of Visually Grounded Story Generation:
To reproduce the results in our case study (i.e., Figure 2) as well as in the appendix (i.e., Figure 5 in Appendix F), please run the following command.

    ```yaml
    python story_generation_demo.py
    ```