Poster: pdf
Keywords: Credit scoring, Ensemble models, Financial inclusion, Emerging markets, Africa, Microfinance, Data scarcity, Logistic regression, SHAP, Threshold optimization, SMOTE, Alternative data
TL;DR: This paper presents a cost-sensitive, interpretable machine learning framework that improves credit scoring accuracy in African microfinance by combining alternative data, ensemble models, and threshold optimization under data-scarce conditions.
Abstract: One major obstacle to the advancement of financial inclusion in Africa has been the lack of data. This problem has also been exacerbated by the erratic swings in the financial and economic landscape throughout the continent. We create an ” African-aware context” credit-risk framework to address the crucial issues of data scarcity and volatility in African microfinance. Our approach combines key financial ratios, robust median imputation, and temporal feature engineering. Using a pipeline consisting of logistic regression, random forests, gradient boosting, and SVMs, augmented by SMOTE balancing and ANOVA selection, we discover the surprising result that, on a representative dataset of 10,000 people from West Africa, logistic regression performs better than complex ensembles (AUC-ROC=0.603). Importantly, our new cost-sensitive thresholding reduces expected financial losses by 35%. With a focus on SHAP-based interpretability and deployment in resource-constrained environments, our study ends with important yet practical recommendations.
Submission Number: 3
Loading