Dirty-Data Impacts on Regression Models: An Experimental EvaluationOpen Website

2021 (modified: 13 Jun 2021)DASFAA (1) 2021Readers: Everyone
Abstract: Data quality issues have attracted widespread attentions due to the negative impacts of dirty data on regression model results. The relationship between data quality and the accuracy of results could be applied on the selection of appropriate regression model with the consideration of data quality and the determination of data share to clean. However, rare research has focused on exploring such relationship. Motivated by this, we design a generalized framework to evaluate dirty-data impacts on models. Using the framework, we conduct an experimental evaluation for the effects of missing, inconsistent, and conflicting data on regression models. Based on the experimental findings, we provide guidelines for regression model selection and data cleaning.
0 Replies

Loading