# MIND_large_x1

+ **Dataset description:**
  
  MIND is a large-scale Microsoft news dataset for news recommendation. It was collected from anonymized behavior logs of Microsoft News website. MIND totally contains about 160k English news articles and more than 15 million impression logs generated by 1 million users. Every news article contains rich textual content including title, abstract, body, category and entities. Each impression log contains the click events, non-clicked events and historical news click behaviors of this user before this impression. 

  The dataset statistics are summarized as follows:

  | Dataset Split  | Total | #Train | #Validation | #Test | 
  | :--------: | :-----: |:-----: | :----------: | :----: | 
  | MIND_large_x1 |      |    |      |     | 

+ **Source:** https://msnews.github.io/index.html
+ **Download:** https://huggingface.co/datasets/reczoo/MIND_large_x1/tree/main
+ **RecZoo Datasets:** https://github.com/reczoo/Datasets

+ **Used by papers:**
  - Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, Ming Zhou. [MIND: A Large-scale Dataset for News Recommendation](https://aclanthology.org/2020.acl-main.331). In ACL 2020.
  - Jian Li, Jieming Zhu, Qiwei Bi, Guohao Cai, Lifeng Shang, Zhenhua Dong, Xin Jiang, Qun Liu. [MINER: Multi-Interest Matching Network for News Recommendation](https://aclanthology.org/2022.findings-acl.29.pdf). In ACL 2022.
  - Qijiong Liu, Jieming Zhu, Quanyu Dai, Xiaoming Wu. [Boosting Deep CTR Prediction with a Plug-and-Play Pre-trainer for News Recommendation](https://aclanthology.org/2022.coling-1.249.pdf). In COLING 2022.
  
+ **Check the md5sum for data integrity:**
  ```bash
  $ md5sum train.csv valid.csv test.csv news_corpus.tsv
  955b80b959fb15076a0568d82da6bf05  train.csv
  4942111ca7ba975b5f5dae8e2c54f1f0  valid.csv
  cbd5e69d573dc471d9f9ae91f2b5690f  test.csv
  9007e6b9127ff71bf146b7cfc1dc842d  news_corpus.tsv
  ```
