# MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

# The MME evaluation dataset is collected by Xiamen University for academic research only. Commercial use in any form is prohibited. 
We have made efforts to comply with the licenses of the used publicly available datasets. If there is any infringement in the MME dataset, please email guilinli@stu.xmu.edu.cn to remove it. The copyright of all images in the used publicly available datasets belongs to the image owners.
Without prior approval from Xiamen University, you can not distribute, publish, copy, disseminate, or modify the MME dataset in whole or in part. 
The MME dataset can only be used if you agree the above restrictions. 

# Statement: The images of Landmark and Artwork datasets need to be downloaded manually according to their data-use liscense, and the download methods have been explained in the corresponding folders. 
The sources of the used publicly available datasets in MME are listed as follows. Please follow the corresponding data-use license as required. 

# An automated evaluation tool for the accuracy, accuracy+, and score can be found in the folder "eval_tool".

# The used publicly available datasets:

- Existence, Count, Position, and Color:
Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence. Microsoft coco: Common objects in context. ECCV 2014.

- Poster and Celebrity: (Note that for the celebrity, we plot a red box to a person with a clearly visible face in the image.)
Huang, Qingqiu and Xiong, Yu and Rao, Anyi and Wang, Jiaze and Lin, Dahua. MovieNet: A Holistic Dataset for Movie Understanding. ECCV 2020.

- Scene:
Zhou, Bolei and Lapedriza, Agata and Xiao, Jianxiong and Torralba, Antonio and Oliva, Aude. Learning deep features for scene recognition using places database. NeurIPS 2014.
Zhou, Bolei and Lapedriza, Agata and Khosla, Aditya and Oliva, Aude and Torralba, Antonio. Places: A 10 million image database for scene recognition. IEEE TPAMI 2017.

- Landmark:
Weyand, Tobias and Araujo, Andre and Cao, Bingyi and Sim, Jack. Google landmarks dataset v2-a large-scale benchmark for instance-level recognition and retrieval. CVPR 2020.

- Artwork: 
Mao, Hui and Cheung, Ming and She, James. Deepart: Learning joint representations of visual arts. ACM 2017.
Mao, Hui and She, James and Cheung, Ming. Visual Arts Search on Mobile Devices. TOMM 2019.

- OCR:
Liu, Yuliang and Jin, Lianwen and Zhang, Shuaitao and Luo, Canjie and Zhang, Sheng. Curved scene text detection via transverse and longitudinal sequence connection. PR 2019.


