Identifying Key Predictors of Food Security in Mexico using Machine Learning Models

Published: 22 Sept 2025, Last Modified: 22 Sept 2025WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Food insecurity, Mexico, ELCSA, Household characteristics, Machine learning, XGBoost
Abstract: The purpose of this paper is to measure the prevalence of food insecurity in Mexico by employing the National Survey of Household Income and Expenditure 2022 (in Spanish, ENIGH 2022) and the experience-based, Latin American and Caribbean Food Security Scale (in Spanish, ELCSA). It also draws on Perignon et al. (2023) revised Health Purchase Index (r-HPI), which evaluates the dietary diversity and nutritional quality of household food purchases. This study analysed 90,102 household records, providing a nationally representative sample. The Rasch Model was used to validate the ELCSA survey as a valid measurement tool. A logit regression model and three machine learning approaches — Least Absolute Shrinkage and Selection Operator (LASSO), Extreme Gradient Boosting (XGBoost), and Random Forest (RF) — were used to identify the predictors of food insecurity. The results indicate that the prevalence of moderate to severe food insecurity was 20.86%, or roughly 2.6 million people, and severe food insecurity was 3.87% or 497,682 people, based on Mexico’s population in 2022 (128.6 million people). Statistical validation recommended by the Food and Agriculture Organization of the United Nations (FAO) showed that the validation tests of the ELCSA data were within the expected range for infit, outfit, reliability, and residual correlation. The tuned XGBoost model was selected as the preferred machine learning algorithm for predictions because it had the highest accuracy, F1 score, and AUC, and strong performance in precision and recall. The SHAP value analysis based on the tuned XGBoost model revealed that the main predictors influencing food security were log total income, education, household size, income earners, and age. The most vulnerable groups susceptible to food insecurity included household heads between the ages of 20 and 25, those over 55, those with low dietary quality, and those with a high ratio of food expenditure. It is concluded that ELCSA data is a valid measurement tool for assessing the prevalence of food insecurity in Mexico, and the tuned XGBoost can be used as a predictive machine learning (ML) model for food security. And therefore, based on the results and the supporting evidence from the literature, it is recommended that social policies related to health, education, and career development, particularly those involving cash transfers to the most vulnerable groups, be improved for their efficacy and quality in reducing food insecurity, in addition to their coverage.
Submission Number: 245
Loading