Keywords: Blockchain, Bitcoin, Shared Send Mixer, Semi-Supervised Learning
Abstract: Detecting illicit cryptocurrency transactions is hampered by extreme class imbalance, adversarial obfuscation, and a scarcity of reliable labels. While semi-supervised learning (SSL) offers a promising solution by leveraging unlabeled data, we show that its success is not guaranteed by data volume alone but is contingent on data quality. We introduce an SSL framework for identifying illicit flows in Bitcoin's Shared Send Mixers (SSMs) and make three contributions: (1) The first complete historical dataset of 163 million Bitcoin transactions with SSM classification; (2) Novel, high-fidelity features—KeyLinker address clustering and Shared Send Untangling (SSU) complexity metrics—designed to capture mixing structures and improve data quality; (3) A demonstration that SSL effectively leverages unlabeled data (F1-score: 0.84) precisely when guided by these quality-focused features. Crucially, we prove that common heuristics like One-Time Change (OTC), though abundant, introduce noise, while strategic reliance on higher-fidelity features like KeyLinker is essential. Our work establishes that in blockchain forensics, the path to better performance lies in smarter feature engineering for data quality, not just larger datasets.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 9650
Loading