High-Resolution Poverty Mapping with Foundation Models: A Cost-effective Approach from Street Views to Satellite Images

William Lu, Zhili Li, Yiqun Xie

Published: 01 Jan 2024, Last Modified: 15 May 2025IEEE Big Data 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Although standards of living are increasing rapidly worldwide, a considerable segment of the global population continues to live in poverty. Local governments and decision makers urgently need actionable fine-scale poverty maps to know the locations of the low income population for operational resource distribution. However, most existing studies focus on coarse-resolution poverty maps (e.g., county level) and offer limited information to help deliver the resources to the right locations. Moreover, coarse-resolution maps generated by machine learning models are often trained on higher-level economic statistics that have greater availability. However, such labels at the fine-scale remain very scarce, and existing maps are commonly based on household-level visits that are highly expensive and time-consuming, making them only available in a limited number of cities. We develop a cost-effective approach to tackle the challenge. First, we design a multi-view training data construction approach using data from both street views and very-high-resolution satellite images. Next, we integrate different types of foundation models including the general-purpose vision transformer ViT and the segmentation-focused SegFormer for training and map generation in new cities. Via the use of pretrained large models, the goal is to enhance the generalizability with a smaller amount of samples. To validate the approach, we carried out a case study in Ghana with the cities of Accra, Kumasi, and Tamale. The results showed the effectiveness of the cost-effective approach in capturing low-income areas with unique characteristics, and the foundation models also demonstrated enhanced ability in generalization with smaller training data sizes.