A Large-scale Dataset with Behavior, Attributes, and Content of Mobile Short-video Platform

Yu Shang; Chen Gao; Nian Li; Yong Li

A Large-scale Dataset with Behavior, Attributes, and Content of Mobile Short-video Platform

Yu Shang, Chen Gao, Nian Li, Yong Li

25 Sept 2024 (modified: 17 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large-scale dataset, Behavior, Attributes, Video content, Mobile short-video platform

TL;DR: We provide a large-scale dataset with rich user behavior, attributes and video content from a real mobile short-video platform.

Abstract: Short-video platforms show an increasing impact on people’s daily life nowadays, with billions of active users spending plenty of time each day. The interactions between users and online platforms give rise to many scientific problems across computational social science and artificial intelligence. However, despite the rapid development of short-video platforms, currently there are serious shortcomings in existing relevant datasets on three aspects: inadequate user-video feedback, limited user attributes and lack of video content. To address these problems, we provide a large-scale dataset with rich user behavior, attributes and video content from a real mobile short-video platform. This dataset covers 10,000 voluntary users and 153,561 videos, and we conduct three-fold technical validations of the dataset. First, we verify the richness of the behavior data including interaction frequency and feedback distribution. Second, we validate the wide coverage of user-side and video-side attribute data. Third, we confirm the representing ability of the content features. We believe the dataset could support the broad research community, including user modeling, social science, human behavior understanding, etc. Our dataset is available at this anonymous link: http://101.6.70.16:8080/.

Primary Area: datasets and benchmarks

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 4176

Loading