The Tournesol dataset: Which videos should be more largely recommended?

21 May 2024 (modified: 13 Nov 2024)Submitted to NeurIPS 2024 Track Datasets and BenchmarksEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Recommendation, Ethics, Preferences, Human
TL;DR: We publish a dataset of human judgments on which videos should be more largely recommended by algorithms.
Abstract: This paper introduces the Tournesol public dataset, which was collected as part of the online deployed platform https://tournesol.app. Our dataset contains a list of 200,000 comparative judgments made by Tournesol’s 20,000 users on which YouTube videos should be more largely recommended. It also provides 600,000 comparisons along secondary criteria like content reliability, topic importance and layman-friendliness. The dataset also exports information about users’ pretrust statuses and vouches. It is published at https://api.tournesol.app/exports/all under ODC-By license. The data is currently used by Tournesol to make community-driven video content recommendations to over 10,000 users.
Supplementary Material: zip
Submission Number: 624
Loading