Blockchain-based Data Quality Assessment to Improve Distributed Machine LearningDownload PDFOpen Website

Published: 01 Jan 2023, Last Modified: 12 May 2023ICNC 2023Readers: Everyone
Abstract: Data quality assessment is critical for distributed machine learning (DML). Data collected from heterogeneous Internet of things (IoT) devices may contain biased information that decreases the prediction accuracy of DML models. To address these challenges, we propose a blockchain-based approach to assess the quality of data that are not independent and identically distributed (non-IID). A blockchain running atop mobile edge computing (MEC) is helpful to protect privacy, security, and integrity of healthcare data when IoT devices are connected to MEC servers. Therefore, it is critical to integrate data quality assessment module on blockchain when building a blockchain-enabled DML system. In this paper, we jointly consider information loss and marginal utility of non-IID data samples. Specifically, we use Kullback-Leibler (KL) divergence to evaluate the information loss between IID and non-IID data samples and apply the reciprocal of data quantity to model the marginal utility of data samples. Human activities and handwritten digit recognition data sets are used for performance evaluations. Experiments show that our proposed scheme outperforms benchmarks regarding model test accuracy on various non-IID data samples.
0 Replies

Loading