Keywords: Poverty targeting, machine learning, covariate shift
Abstract: A key challenge in the design of effective social protection programs is determining who should be eligible for program benefits. In low and middle-income countries, one of the most common criteria is a Proxy Means Test (PMT) -- a rudimentary application of machine learning that uses a short list of household characteristics to predict whether each household is poor, and therefore eligible, or non-poor, and therefore ineligible. Using nationwide survey data from six low and middle-income countries, this paper documents an important weakness in this use of machine learning: that the accuracy of the PMT prediction algorithm decreases steadily over time, by roughly 1.5-1.9 percentage points per year. We illustrate the implications of this finding for real-world anti-poverty programs, which typically update the PMT model only every 5-8 years, and then show that the aggregate effect can be decomposed into two forces: "model decay" caused by model drift, and "data decay" caused by changing household characteristics. Our final set of results show how an understanding of these forces can be used to optimize data collection policies to improve the efficiency of social protection programs.
Submission Number: 19
Loading