Opportunities and Challenges of Frontier Data Governance With Synthetic Data

ICLR 2025 Workshop BuildingTrust Submission79 Authors

10 Feb 2025 (modified: 06 Mar 2025)Submitted to BuildingTrustEveryoneRevisionsBibTeXCC BY 4.0
Track: Tiny Paper Track (between 2 and 4 pages)
Keywords: Synthetic Data, Data, AI Governance, Accountability, Trust, Regulation, Data Governance, Bias, Alignment
TL;DR: We outline 3 challenges that synthetic data poses that debase current AI governance efforts, then propose 3 technical mechanisms that address these challenges and position synthetic data as a key regulatory lever for the future.
Abstract: Synthetic data, or data generated by machine learning models, is increasingly emerging as a solution to the data access problem. However, its use introduces significant governance and accountability challenges, and potentially debases existing governance paradigms, such as compute and data governance. In this paper, we identify 3 key governance and accountability challenges that synthetic data poses - it can enable the increased emergence of malicious actors, spontaneous biases and value drift. We thus craft 3 technical mechanisms to address these specific challenges, finding applications for synthetic data towards adversarial training, bias mitigation and value reinforcement. These could not only counteract the risks of synthetic data, but serve as critical levers for governance of the frontier in the future.
Submission Number: 79
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview