Lossgate: Incomplete Information and Misaligned Incentives Hinder Regulation of Societal Risks in Machine Learning
Keywords: ml regulation, fairness, privacy
Abstract: Regulators seek to curb the societal risks of machine learning; a common aim is to protect the public from excessive privacy violations or bias in models. In the status quo, regulators and companies independently evaluate societal risk. We find that discrepancies in these evaluations can be either a detriment or an advantage for companies. To abide by regulation, a company needs to conservatively evaluate risk: it should train its model such that risk remains below the acceptable threshold-even if the regulator's evaluation returns higher risk measurements. This decreases model utility (up to 8%, in our experiments). Conversely, when the regulator's measurements are consistently lower than theirs, we find that a company can behave strategically and game regulation to train more accurate models. We call this Lossgate, an allusion to Dieselgate in environmental regulation: Volkswagen produced cars that limited their emissions when being subjected to a regulator's emissions measurement. To model incomplete information and the misaligned incentives that explain Lossgate, we leverage game theory. We obtain SpecGame, a model for regulator-company interactions which allows us to estimate the excessive risk that results from the strategic behavior observed in Lossgate. We show Lossgate costs 70–96% higher compared to collaborative regulation in the sum cost for all players.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11681
Loading