Unsupervised Functional Dependency Discovery for Data PreparationDownload PDF

Published: 17 Apr 2019, Last Modified: 05 May 2023LLD 2019Readers: Everyone
Keywords: Functional Dependencies, Sparse Regression, Structure Learning, L1-regularization, Weak Supervision
Abstract: We study the problem of functional dependency (FD) discovery to impose domain knowledge for downstream data preparation tasks. We introduce a framework in which learning functional dependencies corresponds to solving a sparse regression problem. We show that our methods can scale to large data instances with millions of tuples and hundreds of attributes, while recovering true FDs across a diverse array of synthetic datasets, even in the presence of noisy data. Overall, our methods show an average F1 improvement of 2× against state-of-the-art FD discovery methods. Our system also obtains better F1 in downstream data repairing task than manually defined FDs.
3 Replies