Tenrec: A Large-scale Multipurpose Benchmark Dataset for Recommender SystemsDownload PDF

Published: 17 Sept 2022, Last Modified: 12 Mar 2024NeurIPS 2022 Datasets and Benchmarks Readers: Everyone
Keywords: Recommendation, Large-scale, Multipurpose, Dataset, Benchmark
Abstract: Existing benchmark datasets for recommender systems (RS) either are created at a small scale or involve very limited forms of user feedback. RS models evaluated on such datasets often lack practical values for large-scale real-world applications. In this paper, we describe Tenrec, a novel and publicly available data collection for RS that records various user feedback from four different recommendation scenarios. To be specific, Tenrec has the following five characteristics: (1) it is large-scale, containing around 5 million users and 140 million interactions; (2) it has not only positive user feedback, but also true negative feedback (vs. one-class recommendation); (3) it contains overlapped users and items across four different scenarios; (4) it contains various types of user positive feedback, in forms of clicking, liking, sharing, and following, etc; (5) it contains additional features beyond the user IDs and item IDs. We verify Tenrec on ten diverse recommendation tasks by running several classical baseline models per task. Tenrec has the potential to become a useful benchmark dataset for a majority of popular recommendation tasks. Our source codes and datasets will be included in supplementary materials.
Author Statement: Yes
Supplementary Material: pdf
URL: https://github.com/yuangh-x/2022-NIPS-Tenrec
Dataset Url: https://static.qblv.qq.com/qblv/h5/algo-frontend/tenrec_dataset.html
License: Dataset License: CC BY NC 4.0 Code License: Apache License 2.0
Contribution Process Agreement: Yes
In Person Attendance: No
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/arxiv:2210.10629/code)
22 Replies