Playing the Data: Video Games as a Tool to Annotate and Train Models on Large Datasets

Published: 10 Jun 2025, Last Modified: 30 Jun 2025MoFA PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Citizen Science, Video Games, Data Annotation, Bioinformatics, Machine Learning Models, Human Feedback Integration
TL;DR: This paper explores citizen science projects embedded in video games as a method for generating high-quality biological data annotations to support the training and evaluation of AI models.
Abstract: Citizen science platforms can generate vast quantities of labeled data by engaging non-expert human contributors in solving tasks relevant to AI model development. In this work, we present insights from two deployed citizen science projects—Borderlands Science and Project Discovery—that have engaged millions of participants in annotating complex biological data. We discuss how human feedback collected via these platforms can be used to train or fine-tune AI models, with implications for learning from noisy demonstrations, preference aggregation, and biological discovery inspired by innate human intuition. We demonstrate how data from citizen science can be systematically used to train and evaluate machine learning models for biological sequence alignment and clustering, and propose a framework for aggregating and leveraging noisy human strategies at scale.
Submission Number: 56
Loading