Leveraging Ensemble-Based Semi-Supervised Learning for Illicit Account Detection in Ethereum DeFi Transactions
Keywords: Decentralized Finance, Ethereum Blockchain, Illicit Account Detection, Self-Learning, Semi-Supervised Learning
Abstract: The advent of smart contracts has enabled the rapid rise of Decentralized Finance (DeFi) on the Ethereum blockchain, offering substantial rewards in financial innovation and inclusivity. This growth, however, is accompanied by significant security risks such as illicit accounts engaged in fraud. Effective detection is further limited by the scarcity of labeled data and the evolving tactics of malicious accounts. To address these challenges with a robust solution for safeguarding the DeFi ecosystem, we propose $\textbf{SLEID}$, a $\textbf{S}$elf-$\textbf{L}$earning $\textbf{E}$nsemble-based $\textbf{I}$llicit account $\textbf{D}$etection framework. SLEID uses an Isolation Forest model for initial outlier detection and a self-training mechanism to iteratively generate pseudo-labels for unlabeled accounts, enhancing detection accuracy. Experiments on 6,903,860 Ethereum transactions with extensive DeFi interaction coverage demonstrate that SLEID significantly outperforms supervised and semi-supervised baselines with $\textbf{+2.56}$ percentage-point precision, comparable recall, and $\textbf{+0.90}$ percentage-point F1---particularly for the minority illicit class---alongside $\textbf{+3.74}$ percentage-points higher accuracy and improvements in PR-AUC, while substantially reducing reliance on labeled data.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 24189
Loading