Aggregated Individual Reporting for Post-Deployment Evaluation

Published: 10 Jun 2025, Last Modified: 30 Jun 2025MoFA PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: post-deployment evaluation, reporting, position paper
Abstract: The need for developing model evaluations beyond static benchmarking is now well-understood, bringing attention to post-deployment auditing and evaluation. At the same time, concerns about the concentration of power in deployed AI systems have sparked a keen interest in “democratic” or “public” AI. In this work, we bring these two ideas together by proposing mechanisms for aggregated individual reporting (AIR), a framework for post-deployment evaluation that relies on individual reports from the public. AIRs allow those who interact with a specific, deployed (AI) system to report when they feel that they may have experienced something problematic; these reports are then aggregated over time, with the goal of evaluating the relevant system in a fine-grained manner. This position paper argues that individual experiences should be understood as an integral part of post-deployment evaluation, and that the scope of our proposed aggregated individual reporting mechanism is a practical path to that end. On the one hand, individual reporting can identify substantively novel insights about safety and performance; on the other, aggregation can be uniquely useful for informing action. From a normative perspective, the post-deployment phase completes a missing piece in the conversation about “democratic” AI. As a pathway to implementation, we provide a workflow of concrete design decisions and pointers to areas requiring further research and methodological development.
Submission Number: 24
Loading