RAG4DMC: Retrieval-Augmented Generation for Data-Level Modality Completion

RAG4DMC is a multimodal learning pipeline that integrates datasets such as COCO, Flickr30k, CC3M, and RSICD.
It supports:

Building internal and external knowledge bases

Cross-dataset embedding alignment 

Multi-modal retrieval

Data generation and candidate selection for missing-modality completion

Training an improved CLIP model and evaluating retrieval performance

Dependencies
conda create -n multimodal python=3.9 -y
conda activate multimodal

# Install PyTorch (choose version based on your CUDA setup)
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118

# Install project dependencies
pip install -r requirements.txt

Dataset Preparation
COCO 2017

The script will automatically download the required subset:

coco_data/
  ├── annotations/captions_train2017.json
  ├── train2017/xxxx.jpg

Flickr30k

You need to provide the image archive and captions CSV:

flickr30k_kb/
  ├── flickr30k-images.zip
  ├── flickr_annotations_30k.csv

CC3M

Automatically loaded from HuggingFace: pixparse/cc3m-wds.

RSICD

Automatically loaded from HuggingFace: arampacha/rsicd.

Usage
1. Train the pipeline
python main.py \
  --dataset MScoco \
  --data_dir ./coco_data \
  --train_size 10000 \
  --test_size 1000 \
  --img_only_ratio 0.35 \
  --mlp_epochs 20 \
  --clip_epochs 20 \
  --N 10 \
  --k 10

2. Data Augmentation

You can call the augmentation function to generate missing-modality samples:

from utils.dataset import augment_dataset_img_txt_easy

3. Evaluation

After training, results are saved in:

evaluation/{dataset}/{train_size}/{img_only_ratio}/evaluation_results_*.json

Project Structure
.
├── main.py                 # Main training entry
├── data_processor.py       # Dataset processors (COCO, Flickr, RSICD, CC3M)
├── models/                 
│   ├── clip_model.py       # CLIP training wrapper
│   ├── bidirectional_mlp.py# Bidirectional MLP
│   └── generator.py        # RAG generator
├── utils/
│   ├── knowledge_base.py   # Knowledge base construction
│   ├── easy.py             # Easy alignment utils
│   ├── dataset.py          # Data augmentation
│   ├── evaluation.py       # Evaluation metrics
│   └── niqe.py             # Image quality assessment
├── requirements.txt
└── README.md