README————————————————————————————————————————-


Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models

# ICLR Submission

We provide nocaps dataset and codebase in this supplementary material.

# Preparation

Prepare training dataset of text as txt file containing text line by line.
Prepare training dataset of image-text pairs as json, containing "filename" as image path, and "caption" of corresponding caption

Please refer to main paper, related works, and appendix for reproduction of XBT.

We will upload full-version our code publicly.

Thanks.
