Visual Analytics System of Comprehensive Data Quality Improvement for Machine Learning using Data- and Process-driven Strategies

Published: 01 Jan 2022, Last Modified: 30 Jul 2025IEEE Big Data 2022EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Machine learning (ML) models are used to mine inconspicuous information in big data. The model and data quality influence the performance of a ML model. However, modifying the ML model while measuring performance is impractical, and low-quality data causes biased model training. Therefore, improving the data quality is essential. Visual analytics systems supporting DQI (Data Quality Improvement) have been proposed in the past. However, in the studies, it is difficult for users to assess comprehensive data quality improvement methods for machine learning and to determine an appropriate data quality improvement process. In this paper, we propose a novel visual analytics system for managing data quality used in machine learning models.
Loading