Keywords: reliable machine learning, strategic classification, strategic regression, strategic manipulation, feature selection
Abstract: Algorithmic prediction rules are increasingly used to allocate resources, such as targeting households for social welfare programs, determining payments to Medicare Advantage insurers, and assigning eligibility for social benefits, all of which create incentives for strategic manipulation of input features. Policymakers often respond by excluding manipulable features from the prediction model, yet it is not well understood when this reduces the prediction risk. In this paper, we analyze feature selection under strategic behavior in a linear regression setting, motivated by risk adjustment models in U.S. health policy. Our model characterizes how organizations strategically manipulate reported features in response to decision rules, and how a regulator can counteract such strategic behavior through feature selection. We establish sufficient conditions on the cost structure of feature manipulation that identify when excluding manipulable features reduces prediction risk, and conversely, when retaining the full feature set yields more accurate predictions. These results offer a first step toward principled feature selection methods that explicitly account for unreliable and strategically manipulated data inputs.
Submission Number: 283
Loading