The Strong, weak and benign Goodhart's law. An independence-free and paradigm-agnostic formalisation
Keywords: Goodhart's law, alignement, metric design, safety
TL;DR: Alleviate the previously made independence assumption in a paradigm-agnostic setting to derive insight on Goodhart's law effect in light tailed and heavy tailed scenarios.
Abstract: Goodhart’s law is a famous adage in policy-making that states that “When a measure be-comes a target, it ceases to be a good measure”. As machine learning models and the optimisation capacity to train them grow, growing empirical evidence reinforced the belief in the validity of this law without however being formalized. Recently, a few attempts were made to formalise Goodhart’s law, either by categorizing variants of it, or by looking at how optimizing a proxy metric affects the optimisation of an intended goal. In this work, we alleviate the simplifying independence assumption, made in previous works, and the assumption on the learning paradigm made in most of them, to study the effect of the coupling between the proxy metric and the intended goal on Goodhart’s law. Our results shows that in the case of light tailed goal and light tailed discrepancy, dependence does not change the nature of Goodhart’s effect. However, in the light tailed goal and heavy tailed discrepancy case, we exhibit an example where over-optimisation occurs at a rate inversely proportional to the heavy tailedness of the discrepancy between the goal and the metric.
Submission Number: 14
Loading