ZipZap: Efficient Training of Language Models for Ethereum Fraud Detection

Sihao Hu; Tiansheng Huang; Ka-Ho Chow; Wenqi Wei; Yanzhao Wu; Ling Liu

ZipZap: Efficient Training of Language Models for Ethereum Fraud Detection

Sihao Hu, Tiansheng Huang, Ka-Ho Chow, Wenqi Wei, Yanzhao Wu, Ling Liu

Published: 23 Jan 2024, Last Modified: 23 May 2024TheWebConf24 OralEveryoneRevisionsBibTeX

Keywords: efficient training of language models, Ethereum fraud detection

TL;DR: We introduce a framework that delivers parameter and computational efficiency for training language models tailored for detecting fraud on Ethereum.

Abstract: Language models (LMs) have demonstrated superior performance in detecting fraudulent activities on Ethereum. Nonetheless, the sheer volume of Ethereum data results in excessive memory and computational costs when training LMs from scratch, limiting their capabilities to scale to a large magnitude for practical applications. In this paper, we present ZipZap, a framework tailored to achieve both parameter and computational efficiency when training LMs on Ethereum-centric data. First, with the \textit{frequency-aware} compression, ZipZap is able to compress an LM down to a mere 6\% of its initial size with an imperceptible performance dip. This technique correlates the embedding dimension of an address with its occurrence frequency in the dataset, motivated by the observation that embeddings of low-frequency addresses are insufficiently trained and thus negating the need for a uniformly large dimension for knowledge representation. Second, ZipZap accelerates the speed through the \textit{asymmetric} training paradigm: It performs transaction dropping and cross-layer parameter-sharing to expedite the pre-training process, while revert to the standard training paradigm for fine-tuning to strike a balance between efficiency and efficacy, motivated by the observation that the optimization goals of pre-training and fine-tuning are inconsistent. In addition, extensive evaluations on real-world, large-scale datasets demonstrate that ZipZap delivers notable parameter and computational efficiency improvements for LMs tailored for Ethereum data.

Track: Systems and Infrastructure for Web, Mobile, and WoT

Submission Guidelines Scope: Yes

Submission Guidelines Blind: Yes

Submission Guidelines Format: Yes

Submission Guidelines Limit: Yes

Submission Guidelines Authorship: Yes

Student Author: Yes

Submission Number: 247

Loading