Aggregated Individual Reporting for Post-Deployment Evaluation: Mechanism Design & Modeling Considerations

Published: 29 Sept 2025, Last Modified: 12 Oct 2025NeurIPS 2025 - Reliable ML WorkshopEveryoneRevisionsBibTeXCC BY 4.0
Keywords: individual reporting, collective action, biased data, mechanism design, incentives
Abstract: Evaluating the real-world behavior of AI systems once they are deployed is a key component of understanding their societal impact. One approach proposed in recent work is \textit{aggregated individual reporting} (AIR), where end-users of deployed systems are able to submit feedback ("reports") to a central mechanism, which aggregates these reports to compute an evaluation of the deployed system. The goal of the mechanism is to understand the true state of the world, based on submitted reports. A key open question is how an optimal AIR mechanism might be designed when reporting behavior is taken explicitly into consideration. This gives rise to two simultaneous challenges: First, that of designing rewards for reporting in order to incentivize high-quality feedback, and second, that of making reliable (statistical) inferences on an inherently "non-i.i.d." sample of information. In this extended abstract, we describe some work-in-progress that seeks to initiate rigorous study of these problems. We provide a "maximalist" model of the interaction between end-users and an AIR mechanism, as well as some "minimal" example instantiations of the model. We discuss various research questions that naturally arise from this model.
Submission Number: 190
Loading