What is Real Anymore? An AI/ML Image Dataset Using Authenticity Validation and Traceable Origins for Every Data Instance

Published: 02 Jan 2025, Last Modified: 03 Mar 2025AAAI 2025 Workshop AIGOV PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI-generated, AI-generated image detection, image dataset, image classification
TL;DR: An image dataset consisting of authentically validated photographs with hyper realistic AI counterparts aiding the prevention of manipulation and harm of innocent individuals across the globe.
Abstract:

This project addresses the increasing challenge of detecting AI-generated images by creating a novel dataset titled “What Is Real Anymore?” (WIRA). WIRA comprises two subsets: the first includes over 2000 images, validated as authentically real by a set criterion and sourced from photographs on Flickr. The second subset consists of hyper-realistic AI-generated counterparts for each validated Flickr image, aggregated through the Leonardo.AI commercial API. All Flickr-validated images in WIRA are credited to their respective photographers and retain their associated rights. Commercial use of this dataset requires permission from the photographers or adherence to the copyright laws of each validated Flickr image used. This document details the rationale for image authentication, image categories, the motive for category selection, authenticity validation criterion, methodology for the creation of the dataset, the computational resources used, a review of included and excluded decision records, and potential enhancements to expand WIRA.

Submission Number: 19
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview