# CLIP-ViL on Visual Question Answering and Image Captioning

In our paper "[How Much Can CLIP Benefit Vision-and-Language Tasks?]", we show the improvement of CLIP features
over the traditional resnet features on the visual question answering and image captioning tasks.

We release the extracted features and reproducible code here.

## Related Links
- CLIP: [paper](https://github.com/openai/CLIP), [code](https://github.com/openai/CLIP)
- Grid Features: [paper](https://arxiv.org/abs/2001.03615), [code](https://github.com/facebookresearch/grid-feats-vqa)

