Refining Heuristic-Based Bitcoin Address Clustering with Graph Neural Networks

ICLR 2026 Conference Submission18907 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Graph Neural Networks, Representation Learning, Clustering, Hierarchical Clustering, Bitcoin
TL;DR: Using GNNs and Constrastive Learning to learn embedding suitable for Bitcoin address clustering.
Abstract: Bitcoin’s pseudonymous nature makes it challenging to analyze user-level activity, since a single user may control multiple identifiers (addresses). Existing heuristic-based methods attempt to identify addresses belonging to the same user, but they often produce flat cluster assignments with limited modularity and are prone to errors such as merging different users together. In this work, we propose a method for refining heuristic-obtain clusters by grounding our clustering on contrastive embeddings yielded by graph neural networks . Our contribution is threefold: (i) we release a publicly available dataset of Bitcoin transaction graphs containing a substantial number of clusters; (ii) we propose a methodology for learning address embeddings consistent with heuristics, and back it up with solid theoretical foundations and empirical results; (iii) through hierarchical clustering, we allow a finer analysis of heuristic clusters and provide a quantitative criterion for flagging suspicious merges.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 18907
Loading