MACHINE LEARNING APPROACHES FOR ELECTRICITY THEFT AND NON-TECHNICAL LOSS DETECTION: A COMPARATIVE ANALYSIS OF LOGISTIC REGRESSION, GRADIENT BOOSTING, AND LSTM ARCHITECTURES
Keywords: electricity theft detection; non-technical losses; LightGBM; LSTM; SMOTE; SHAP explainability; precision-recall; concept drift; billing audit; smart meter fraud
Abstract: Electricity theft and non-technical losses (NTL) cause annual global losses exceeding USD
96 billion. This paper proposes and evaluates a three-phase machine learning pipeline for
electricity theft detection using a real-world dataset of 135,493 utility clients and 4,454,637
invoice records (2005–2019). We engineer 68 features across eight groups, introducing a
novel billing arithmetic audit; specifically, the index delta mismatch and active-meter-zeroconsumption flag; that achieves fraud rate lifts of 2.08x and 2.16x over the dataset baseline.
Under rigorous time-aware evaluation, LightGBM achieves the highest test AUPRC of
0.1296, outperforming Logistic Regression (0.1123) and LSTM (0.1167). SHAP analysis
identifies reading remark code 9 as the dominant predictor (22.55% gain importance), and
reveals that account lifecycle features outperform raw consumption metrics. We quantify
temporal concept drift; fraud prevalence rising from 2.53% (training period) to 6.46% (test
period); and demonstrate that aggregated statistical features capture sequential patterns as
effectively as LSTM modelling under time-aware conditions.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 2
Loading