Using Explainable Boosting Machines (EBMs) to Detect Common Flaws in Data

Zhi Chen, Sarah Tan, Harsha Nori, Kori Inkpen, Yin Lou, Rich Caruana

2021 (modified: 15 Nov 2022)PKDD/ECML Workshops (1) 2021Readers: Everyone

Abstract: Every dataset is flawed, often in surprising ways that data scientists might not anticipate. However, popular machine learning methods are mostly black-boxes. Due to their lack of interpretability, they might learn defective knowledge from these datasets, which can be difficult to detect. In this work, we show how interpretable machine learning methods such as EBMs can help users detect problems that are lurking in their data. Specifically, we provide a number of case studies, where EBM discovers various types of common dataset flaws, including missing values, confounding and treatment effects, data drift, bias and fairness, and outliers. In each case study, we analyze the flaws using visualization of EBM shape functions combined with domain knowledge. We also demonstrate that in some cases interpretable learning methods such as EBMs provide simple tools for correcting problems when correcting the data is difficult.

0 Replies