A Cross-Silo Vulnerability Federated Learning Approach Based on Content Chunking

Weisheng Zhang, Jiapeng Zhang, Siyang Yu, Mingxing Duan, Kenli Li

Published: 01 Jan 2025, Last Modified: 12 Nov 2025IEEE Internet Things J. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The proliferation of vulnerable code poses a significant threat to software system security and user privacy. Given the inefficiency inherent in manual vulnerability analysis, there has been a pronounced surge of interest in automating vulnerability management using machine learning techniques. However, the scarcity of publicly accessible and large-scale datasets in the vulnerability domain impedes the advancement of automated methodologies. The advent of federated learning has introduced the potential utilization of private data for learning, while ensuring privacy and security within this paradigm presents a novel challenge. To solve this problem, we introduce a new approach called vulnerability solution with abstract syntax tree (AST), SOEHash, and clustering (V-ASC). We first obtain the AST of the vulnerability code to obtain the underlying pattern of the vulnerability. To protect data privacy as well as to extract vector features of the, we use the SOEHash algorithm to process the AST. Finally, to speed up the process of similarity comparison between vectors, we use an unsupervised clustering algorithm to transform the set of vectors into individual vulnerability clusters. Experiments on a recent vulnerability code dataset validate the effectiveness and efficiency of V-ASC.

External IDs:dblp:journals/iotj/ZhangZYDL25