# MIND_small_x1

+ **Dataset description:**
  
  MIND is a large-scale Microsoft news dataset for news recommendation. It was collected from anonymized behavior logs of Microsoft News website. MIND totally contains about 160k English news articles and more than 15 million impression logs generated by 1 million users. Every news article contains rich textual content including title, abstract, body, category and entities. Each impression log contains the click events, non-clicked events and historical news click behaviors of this user before this impression. The MIND-small version of the dataset is made by randomly sampling 50,000 users and their behavior logs from the MIND dataset. 

  The dataset statistics are summarized as follows:

  | Dataset Split  | Total | #Train | #Validation | #Test | 
  | :--------: | :-----: |:-----: | :----------: | :----: | 
  | MIND_small_x1 |   8,584,442   | 5,843,444   |  2,740,998    |   | 

+ **Source:** https://msnews.github.io/index.html
+ **Download:** https://huggingface.co/datasets/reczoo/MIND_small_x1/tree/main
+ **RecZoo Datasets:** https://github.com/reczoo/Datasets

+ **Used by papers:**
  - Fangzhao Wu, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu, Xing Xie, Jianfeng Gao, Winnie Wu, Ming Zhou. [MIND: A Large-scale Dataset for News Recommendation](https://aclanthology.org/2020.acl-main.331). In ACL 2020.
  - Jian Li, Jieming Zhu, Qiwei Bi, Guohao Cai, Lifeng Shang, Zhenhua Dong, Xin Jiang, Qun Liu. [MINER: Multi-Interest Matching Network for News Recommendation](https://aclanthology.org/2022.findings-acl.29.pdf). In ACL 2022.
  - Qijiong Liu, Jieming Zhu, Quanyu Dai, Xiaoming Wu. [Boosting Deep CTR Prediction with a Plug-and-Play Pre-trainer for News Recommendation](https://aclanthology.org/2022.coling-1.249.pdf). In COLING 2022.
  
+ **Check the md5sum for data integrity:**
  ```bash
  $ md5sum train.csv valid.csv news_corpus.tsv
  51ac2a4514754078ad05b1028a4c7b9a  train.csv
  691961eb780f97b68606e4decebf2296  valid.csv
  51e0b3ae69deab32c7c3f6590f0dab72  news_corpus.tsv
  ```
