# MultiID-2M (Subset) for Anonymous Review

We provide half of the MultiID-2M paired dataset for anonymous review. The full dataset will be released upon acceptance.

This repository provides a subset of our proposed MultiID-2M dataset as supplementary material for anonymous review.

The included CSV files contain 200k data entries. Each entry includes the following fields:

- **name**: An anonymized identifier for the individuals in the image. Each name is mapped to a unique number to protect privacy.
- **url**: A direct link to the original image.
- **bboxes**: Bounding boxes for the faces detected in the image.
- **text_bboxes**: OCR-detected bounding boxes for any text present in the image.
- **ram\***: Scores from the "Recognize Anything Model," indicating the likelihood that the image is a photograph, collage, advertisement, portrait, print, or cartoon.
- **caption_en**: An English caption describing the image.
- **crop**: A bounding box for cropping out unwanted text while preserving the face area. If `text_bboxes` is `None`, this will be `[0, 0, height, width]` (i.e., the entire image).
- **aesthetic_score**: An aesthetic quality score for the image.

A Python notebook is provided to demonstrate the dataset, which randomly displays a sample entry from the csv file.